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INTELLIGENCE TESTING. AND THE NATURE/NURTURE 
DEBATE, 1928-1978: WHAT МЕХТ?Т 


By P. E. VERNON 
(Department of Educational Psychology, University of Calgary, Alberta) 


SuMMARY.— Changing ideas on intelligence testing and the heritability of intelligence are 
followed thréugh a fifty-year period. Common criticisms of intelligence tests are 
examined, but it is concluded that intellectual tests will continue to be of value in 
diagnosing the strengths and weaknesses, particularly of exceptional children. 

The arguments about the extent to which heredity and environment contribute to 
the development of intelligence are traced through a turbulent period. The evidence 
presented is consistent with an environmental contribution of over 20 per cent of the vari- 
ance, although the precise value is not considered to be as important as recognising that 
both genes and environment have very substantial effects on the measured intelligence of 
children. The inaccuracies detected in some of the research reports on the heritability 
of intelligence are not considered to invalidate the argument as a whole. 


THE MEASUREMENT OF INTELLIGENCE 


IN 1928, Spearman, Burt, Drever and Thomson reigned supreme in British psychology. 
The Stanford-Binet scale for children had come into general use; group tests had 
demonstrated their value in the American army; and many such tests as the Moray 
House series were available for children. Despite the bitter controversies between 
Spearman and Thomson, there was fairly general agreement on the following principles 
underlying intelligence testing. 

First it was assumed that intelligence is a recognisable attribute which is respon- 
sible for differences among children and adults in their learning, reasoning, and other 
cognitive capacities. It is an homogeneous entity or mental power which, like height 
or weight, can vary in amount, or in rate of growth or decline, but is essentially stable 
in its nature throughout life. Secondly, although obviously it is not measurable in the 
same sense as physical attributes like height, yet the principle of sampling appropriate 
mental tasks and standardising or norming scores against the distribution in the general 
population yields IQs which can be accepted as quantitative measures of level of 
intelligence. Thirdly, intelligence is essentially innate, being determined by the genes 
that the child inherits from his parents; hence it develops or matures with age, 
irrespective of the environment in which he is reared. It reaches its maximum by 
around 15 years and then stays constant until senility sets in. Thus the IQ obtained 
from a reliable intelligence test in childhood indicates the educational and vocational 
level that the person can be expected to attain in his later school career and in adult 
life. Burt, writing on general intelligence in 1933, said: “ Fortunately it can be 
measured with accuracy and ease ”. ` 


But in 1978 all these statements, though perhaps containing some grain of truth, 
would be hotly contested by the great majority of psychologists. How is it that the 
testing movement, long regarded as a major achievement of applied psychology, and 
accepted by most laymen as veridical, is now so widely distrusted and criticised, and 
is even in some danger of abolition in the United States, where it once flourished most 
luxuriantly? Several States have passed, or at least considered, laws to ban the use of 

* This address was given by Professor Vernon in Edinburgh in September, 1978, in celebration 
of the 50th Anniversary of the founding of the Scottish Council for Research in Education. 
+ © SCRE, 1979. This address is reprinted by permission of SCRE. Further copies may be 


obtained from SCRE in booklet form under the title “ Intelligence Testing, 1928-78: What Next?” 
(ISBN O 901116 16 5, Price 70p). 
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IQ tests in schools, on the grounds that they are culturally biased and do not accurately 
measure intelligence. Many American parents have successfully challenged in the 
courts the allocation of their children to special schools or classes on the basis of low 
IQs. It has also been ruled in some suits that employers cannot refuse to employ 
blacks or others who obtain low test scores, unless there is clear evidence that suit- 
ability for the job depends on what the tests measure. The main attack has been on 
large-scale group testing, and there is less interference as yet with the use of individual 
tests for clinical diagnostic purposes. But the latter too has been criticised, and some 
school psychologists have been forced to substitute the Illinois Test of Psycholinguistic 
Abilities, or tests of concept formation or of Piagetian stages, which measure much the 
same thing rather less effectively and conveniently but which avoid the naughty word 
‘intelligence’. I need hardly mention the similar decline in group intelligence testing 
in the UK with the virtual demise of the 11+. 

In the early 1920s the average scores of American army recruits of different 
national or ethnic descent showed that men of Anglo-American or Northwest 
European descent exceeded those of Southern and Eastern European stock; and 
American negroes scored lower still. But the objection was soon raised that these 
differences resulted from the economic and educational advancement of the different 
groups rather than from innate differences in ability. In 1928, Freeman, Holzinger 
and Mitchell’s, also Burks’s, studies of foster children were published, suggesting that 
the adoption of orphans into good foster homes brought about significant, even if 
limited, improvements in their mean IQs. The work of Hugh Gordon (1923) with 
canal boat and gipsy children, also of N. D. M. Hirsch (1928) with children living in 
isolated rural regions of Kentucky, strengthened the suspicion that IQs were more 
susceptible to environmental advantage or disadvantage than such pioneers as Terman 
and Burt believed. 


Then in 1937, Newman, Freeman and Holzinger published the first investigation 
of identical twins reared apart, which demonstrated that the correlation among 
monozygotic pairs was higher than that among non-identicals (thus demonstrating 
genetic influence), yet it was considerably lower than the correlation among pairs 
reared together (presumably on account of their different environments). At about 
the same time, R. L. Thorndike (1933) pointed out that IQs are much less stable over 
time than was generally believed. Although individual tests given a week apart 
correlated at about 0-95, the reliability coefficients over five years dropped to around 
0-70. Thus intelligence was by no means fixed for life, and the later longitudinal 
studies by Bayley (1949) and others showed that developmental or intelligence 
quotients obtained in the first three years or so of life bore scarcely any relation to later 
childhood or adult IQ. 


TABLE 1 


CORRELATIONS BETWEEN IQS FROM VARIOUS PRESCHOOL OR INTELLIGENCE 
Tests, AT DIFFERENT AGES (FROM М. BAYLEY) 


Age at Number of Years until Retest 
st 
test 1 3 6 12 

3 mths. 0-10 0:05 -0:13 0:02 
1 yr. 0:47 0:23 0:13 0:00 
2 yrs. 0°74 0:55 0:50 0:42 
3 yrs. 0°64 — 0:55 0:33 
4 yrs. — 071 0°73 0:70 
6 yrs. 0:86 0:84 0:81 077 
7 yrs. 0:88 0:87 0°73 0-80 
9 yrs. 0:88 0:82 0:87 — 
11 yrs. 0:93 0:93 0:92 — 
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In 1949, Hebb's influential book, The Organization of Behaviour, seemed to 
provide a reconciliation between the opposed views of hereditarians and environ- 
mentalists, by distinguishing what he called Intelligence A and Intelligence B. Intel- 
ligence A was the genetically determined plasticity of the central nervous system which 
was necessary for any mental growth; but this could not be observed directly nor 
measured. What we do observe and measure fairly effectively is the current level of 
all-round mental efficiency, Intelligence B, and this depends on the interaction between 
the genetic potential and the stimulating or inhibiting effects of the environment in 
which the individual has been reared. Hebb’s experiments on dogs and rats reared in 
different environments supported this interpretation. At about this time, too, Piaget's 
work on child development was becoming widely known in Britain and America, and 
his book on the Psychology of Intelligence (1950) likewise stressed the role of physical 
and social environment, as well as neurological maturation, in the development of 
operational thinking. This interactionist theory is now accepted by almost all 
psychologists except for a few rabid environmentalists; but unfortunately, of course, 
it still leaves scope for a wide range of opinions on the relative importance of hereditary 
and environmental determination. I shall describe later some of the more recent, and 
convincing, studies of this problem. 


The other main line of attack on intelligence testing came from the factor analysts 
who, following Thurstone and Guilford, claimed that intelligence is not a unitary 
general ability, but a collection of numerous independent kinds of ability or primary 
factors, Verbal, Spatial, Number, etc. Now the multifactorial model works quite well 
when batteries of varied tests are given to selected, homogeneous populations such as 
college students. But as Thurstone found when he extended his investigations to 
younger age groups, one or more second-order factors appeared which, he admitted, 
corresponded to Spearman's ‘=’. In an heterogeneous population such as army 
recruits, or an age group of children, the general factor accounts for at least 50 per 
cent of the variance, and additional more specialised abilities for not more than about 
25 per cent. Burt has shown that the * g’ plus group factor model is mathematically 
equivalent to the Thurstone multiple factor solution. They are just alternative, and 
mutually convertible, ways of classifying abilities. Thurstone’s view appeared to be 
superior in practical utility because it provided a profile of scores on half a dozen or 
more separate ability factors, rather than just the single global IQ plus some small 
additional group factors. But the trouble is that the differential or pure-factor scores 
derived from Thurstone's model just did not differentiate. Far too many testees score 
high on all of them, or low on all. Thus we find in practice that, despite all the efforts 
put into factorial investigations and techniques over the past 70 years, almost all 
applied psychologists working in school, clinic, or occupational fields make use chiefly 
of a single general test, or, quite often, of a two-pronged test like Wechsler's verbal 
and performance, or the verbal and non-verbal Lorge-Thorndike or Alice Heim tests, 
or the verbal and quantitative College Entrance Board tests. Numerous critics of 
intelligence testing allege that these general tests are worthless because different 
factorists put forward contradictory theories of ability structure, and there is no 
agreement on the nature of intelligence or ‘ g’ (cf. Block and Dworkin, 1974). But in 
fact the general factor is so prominent that quantitative studies of heritability do give 
fairly consistent results, whereas attempts to study the genetic and environmental 
components of Thurstone primary factor measures have yielded hopelessly con- 
tradictory results. 


THE CRITICISMS CONSIDERED 


Turning now to the common criticisms raised by people who oppose testing, 
whether for ideological or other reasons, and how far can these be answered? 


First, it is often easy to pick out particular test items for ridicule, and to say that 
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these don't measure ‘ real ’ intelligence. Actually, of course, all items will usually have 
been validated against total score, though it is true that some get out of date and need 
revision, e.g. “ In what way аге coal and wood alike? " Though we can’t reach any 
precise operational definition of intelligence, the evidence from factor analysis and 
other sources does show that items are measuring a consistent and important com- 
ponent of ability. But I should have to agree that most of the item-types have 
developed haphazardly without sufficiently clear rationale. Perhaps we could arrive 
at a better sampling of mental efficiency through advances in our knowledge of 
information processing, as Resnick and her co-authors suggest in The Nature of 
Intelligence (1976). 

Second, group tests are often given by untrained laymen, under poorly controlled 
conditions; they may be mis-scored and misinterpreted. With this I should tend to 
agree. Although certainly I have made much use of group tests, their results have 
always seemed to me too chancy to be of great value in individual diagnosis. And this 
is borne out by the considerably greater stability of individual test IQs over time. For 
example, the correlation of Binet 11-year IQs with 17-18-year IQs is about 0-85 to 
0-90, whereas verbal group tests correlate 0-75, and non-verbal group tests about 0-60 
(Hopkins and Bracht, 1975). Individual testers are, of course, better trained and can 
standardise administration more adequately, though they too have been shown to have 
their idiosyncrasies, such as varying standards of evaluating children’s responses. And 
itis highly probable that they are susceptible to halo or expectancy effects: that is they 
judge from what the school tells them, or from impressions obtained in initial con- 
versations, that the child is bright or dull, and this influences their administration and 
scoring. On the other hand there is no justification whatever for the teacher expect- 
ancy effects alleged by Rosenthal and Jacobson (1968). Their experiment purporting 
to show that children who are reported to teachers as being bright obtain significant 
1Q gains over the next few months was full of technical faults. And numerous attempts 
at replication have led to completely negative results (see Elashoff and Snow, 1971). 


Third, test results depend on practice or coaching. Agreed that significant rises 
have been demonstrated, but they are limited, and they can be largely overcome by 
giving all children concerned adequate preliminary practice. The individual tester can 
usually guess when the same test has been used recently. Where this weakness 
becomes really serious is in cross-cultural group testing of children or adults who have 
no previous experience of tests. 

Fourth, test results depend on motivation. Obviously, it is said, the child who 
is confident, co-operative and interested will do better than one who is anxious, bored, 
distractible, or has a negative self-concept. Actually it has been difficult to get much 
confirmatory evidence, except in child guidance cases, where the tester can gauge the 
child’s co-operation, and can do a good deal to build up suitable rapport. She will 
also usually note if the IQ is likely to be unreliable through lack of motivation. But 
the group tester, I must admit, can do very little to stimulate the motivation of all the 
children in a class. Some experiments (e.g. Benton, 1936) have indicated that extra 
motivation has little effect; for instance offering monetary rewards for good perform- 
ance may stimulate children to try more items, but not to get more of them right. 
We also found in the army that the scores of women recruits were not affected by 
minor illnesses such as colds, feeling off colour, or menstrual periods. I should agree 
that this point too is of much greater importance in cross-cultural testing, where the 
testees may be suspicious or anxious about a foreign tester. 

Fifth, I suppose the most common and most misguided criticism of intelligence 
tests is that they merely measure acquired information and skills. Items such as the 
following from Binet or WISC are often quoted. 


Who wrote Romeo and Juliet? 
What is a hieroglyphic? 
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What is the thing to do if another boy (girl) hits you without meaning to? 


The implication in using these illustrative items is that obviously slum children 
have had much less opportunity than middle-class children to hear about the first two, 
while the third represents knowledge of moral conventions in middle-class society, and 
the typical response from lower-class children might be very different. But it is false 
to say that vocabulary and verbal skills are acquired, in the sense that anybody could 
acquire them, if taught. They are developed in the same way as other aspects of 
Intelligence B. Children don’t usually know or use difficult words unless they have 
reached the level of mental maturity sufficient to understand the concepts to which the 
words refer. Actually E. L. Thorndike published a careful investigation of just this 
point in 1927. He gave three tests involving vocabulary and informational skills, and 
three tests involving verbal and mathematical reasoning skills, to Grade 8 boys, and 
found that the correlations between the two types of test were just as high as those 
within either type. In other words, the information tests were measuring reasoning 
capacity as effectively as did tests designed to sample reasoning. However, I would 
agree that most test constructors nowadays do tend to avoid items which seem to 
involve cultural bias, and to rely more on items whose difficulty depends on com- 
plexity of information processing. 


Because of the importance of this topic, I shall quote a more recent study by 
Arthur Jensen (1974) of test bias. The Peabody Picture Vocabulary, which appears 
likely to be culturally loaded, and the Raven Matrices, which appears relatively 
culture-fair, were given to 600 white and 500 black pupils in Californian schools. On 
both tests the whites, as usual, obtained higher mean scores, the black performance 
being equivalent to that of whites a year or two younger. But there were no other 
features of the responses to indicate any differences between the groups. The reliabili- 
ties and the factorial content of tests and items were the same in blacks and whites, and 
the rank orders of item difficulty were almost identical. That is, there were no items 
that were relatively more or less difficult for one group than the other. Also the blacks 
were not handicapped more on vocabulary than on Matrices. Many other studies 
have been carried out by various authors with college students, showing that Scholastic 
Aptitude and other types of intelligence tests predict college grades among blacks in 
just the same way as they do among whites (see Hunter and Schmidt, 1976). The 
tests are more difficult for low economic class or minority group students, and this 
difference is doubtless in part due to deprived conditions of upbringing. But it is just 
not true that the tests are unfairly loaded against particular sub-groups reared within 
western societies, 


My sixth point arises from the admittedly imperfect reliability of intelligence 
tests. Critics often report atrocity stories about children whose educational and 
vocational careers were permanently blighted because they obtained a low IQ soon 
after entry to primary schooling. I would not say that this cannot occur (particularly 
if group tests are used), but no one has compared the incidence of faulty prognoses as 
against the incidence of valid ones, where the information obtained by an intelligence 
test has led to much more appropriate educational provisions than merely relying on 
parents’ or teachers’ judgments. In our Calgary school system, it is usual for special 
class or special school children to be retested individually every two years, in case they 
do show sufficient improvement to justify return to ordinary schooling. 


Last of these common criticisms: Cronbach (1975) points out that the very 
success of intelligence and educational testing has contributed to their downfall. The 
public feels threatened because more and more of their own, or their children’s, 
educational and occupational careers are decided by tests, whose content and the 
scores they yield are kept secret. Many people would prefer the old-fashioned 
approach of school examinations, interviews, and decisions based on academic and 
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occupational record with which they are familiar, and it is difficult to convince them 
that these are even less accurate. Then, too, there is the revolt against invasions of 
privacy at a time when so much personal information is being computerised; thus 
many parents demand access to the psychologist’s files. Whereas the psychologist 
regards this information as confidential as the doctor does; and he refuses to give out 
IQs or other scores because they would so readily be misunderstood. Further, how 
could he carry out clinical examinations or treatment without keeping records of 
family circumstances that the parents might resent? I should like to add too that 
psychologists such as Anastasi (1968), Cronbach (1970), Sattler (1974) and others are 
very well aware of the defects and dangers of testing to which I have drawn attention; 
and they do their best by their textbooks and university courses to train teachers and 
future psychologists to improve their usage of tests. 


THE NATURE/NURTURE DEBATE 


I shall now go on to some of the major investigations of recent years which supply 
strong evidence of the influence of both genetic and environmental factors on chil- 
dren’s intellectual growth. On the genetic side, the greatest weight has been put on 
kinship data, especially on identical twins reared apart. In four separate studies,* 
which yielded only 122 pairs in all, the mean inter-twin correlation was 0-82, and this 
figure can be taken as an estimate of the heritability of ТО, or percentage of genetic 
variance, the remaining 18 per cent being environmental. However, the largest single 
group was the 53 pairs collected by Sir Cyril Burt, and his correlation of 0-88 suggested 
that separated identicals are almost as much alike in intelligence as identicals brought 
up in the same home. In 1972, L. J. Kamin, of Princeton University, drew attention to 
certain discrepancies in Burt's published figures, and in 1974 Jensen issued a complete 
list of all such apparent inconsistencies. Several of these were probably miscopyings, 
but others were more serious, such as reporting identical correlations coefficients for 
different-sized groups of twins, suggesting that Burt had not bothered to recalculate 
when he gathered additional cases. These lapses have been blown up by Kamin (1974), 
and by Gillie (1976) in the UK, into an accusation that Burt's data were faked, and 
that none of his findings are of any scientific value, since we cannot know whether they 
contain further distortions. Í cannot myself regard Burt's work as fraudulent; almost 
all the discovered inconsistencies are so stupid that he would surely have made his 
results much more plausible if he had been intentionally faking. I would agree that he 
was more careless about such details than research psychologists are nowadays; also 
that he was so strongly wedded to genetic explanations that the planning of his 
investigations, and his methods and analyses may have been biased at times. For 
example, the techniques he used for assessing the intelligence of adult relatives, and the 
procedures for getting what he called children's adjusted IQs, seem to have been highly 
subjective. It appears then that we shall have to jettison his individual test data, but 
this does not mean, as the critics claim, that the whole foundations of heritability of 
intelligence are washed away. Other, more scrupulous investigators have obtained 
figures which do not differ significantly from Burt's; and the fact that his correlation 
for separated identicals is higher than anybody else's could very well be due to his 
pairs being children, whereas other workers like Newman and Shields used adults with 
a wide age range, whose test scores, therefore, would certainly be less reliable. 


Kamin's book, The Science and Politics of IQ (1974), does not merely attack Burt. 
He tries to pick holes in all other published studies that suggest genetic effects, and 
several reputable reviewers have rejected his methods and conclusions. If Burt was 
biased in one direction, Kamin is much more so in the opposite direction. However, 
he does make a very valid point about separated twin studies, namely that such twins 
are never randomly assigned to different parents. Although Burt himself denied it, 


* Namely Newman ег al. (1937), Shields (1962), Juel-Nielsen (1965) and Burt (1966). 
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I am sure that the two members of any pair would usually be reared in homes 
quite similar in socio-economic and educational status. Hence part of the high cor- 
relation within such pairs could be due to environmental, and not only genetic, 
similarity. 

More generally, what is called Genetic-Environmental covariance has been 
neglected by most authors of heritability analyses. This term refers to the obvious 
likelihood that intelligent parents who pass on superior genes to their offspring also 
usually provide above-average environments. However, three recent studies of kinship 
data by Morton (1972), Loehlin et al. (1975), and Jensen (1977) have separated off this 
component, and one can see from Table 2 that there is fairly close convergence on a 
genetic percentage of around 65, environmental 23, and covariance 12. Note, too, 
that the genetic per cent is well below the 80 per cent which Jensen advocated in 1969 
on the basis of Burt’s group test figures. I think it quite likely that if we could test 
larger samples under more carefully controlled conditions, the genetic percentage 
might drop below 60. But the precise figure doesn’t matter, so long as it is recognised 
that both the genes and the environment have very substantial effects on child IQ, and 


TABLE 2 


HERITABILITY ANALYSES INCLUDING ESTIMATES FOR 
GENETIC-ENVIRONMENTAL COVARIANCE 





Jencks Loehlin Morton Jensen 
et al. 
Уа “45 ‘61 "68 "65 
VE 35 “23 19 "28 
G x E Cov. "20 15 14 ‘07 


that the genetic component is probably the largest one, which we cannot afford to 
ignore. In any case, Jensen and his critics point out that this figure, be it 80 or 65 or 
50 per cent, is not an absolute one; it is relative to the particular population such as 
British or American whites, and would alter if the range of environmental differences 
altered. I have also neglected the complications of dominance effect and of assortative 
mating, since these do not affect the main argument. 


In view of the many difficulties associated with twin or other kinship data, the 
chief alternative approach is through foster children, since here we can study environ- 
mental effects without any genetic connection between child and foster parent. 
Unfortunately there are still a lot of complications, as Munsinger (1975) pointed out 
in a recent review, and the results of different investigations vary greatly. However, 
there is a fair consensus that adoption at an early age into a good home tends to raise 
the IQs of adoptees, though probably not more than about 10 points on average. The 
correlation between child IQ and measures of foster-parent ability or education, or the 
home rating, are mostly quite small. From six published studies (see Table 3), I found 
a median figure of 0-23, and this is almost certainly boosted by the tendency for 
selective placement, that is the attempt by the adoption agency to match the child with 
the foster home level. In some studies it was possible to get estimates of the ability 
of the true, or biological, parents, though the data are seldom complete. In another 
six studies (Table 4), I found a median figure of 0-30 (or probably higher). In spite of 
Kamin’s obsessional efforts to demolish the published investigations, we must, I think, 
conclude that the genetic influence of true parents who did not rear the children is 
greater than the environmental effects exerted by the foster parents. So this line of 
evidence confirms rather neatly the conclusion from kinship analyses. 


There are a number of other types of evidence which contribute to the case for 
genetic determination of intelligence: the fact that rats and dogs can be bred to 
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TABLE 3 


CORRELATIONS OF FOSTER CHILD IQ WITH FOSTER PARENT 
ABILITY, OR HOME LEVEL 








Study N Correlations 
Freeman et al. 401 39 0:52 
Burks 214 *23 to 42 
Leahy 194 18 to 24 
Skodak & Skeels 139 04 to 20 
Horn & Loehlin 146 09 (0:15 
Munsinger 41 -'j4 


Approximate median «23 


TABLE 4 
CORRELATIONS OF FOSTER CHILD IQ WITH ABILITY OF BIOLOGICAL РАВЕМТ($) 





Age of 
Study N testees Correlations 

Munsinger 41 84 "70 

Skodak & Skeels 63 134 38 to 44 

Horn & Loehlin 146 ? 32 
Lawrence 185 9-14 "26 (with father’s SES) 
Skodak & Skeels 139 7% 23 

Snygg 70 54 12 


Approximate median > 30 


produce bright and dull strains; the tendency for close human inbreeding to yield 
congenital malformations and mental defect; and the discovery that specific gene 
anomalies produce psychological syndromes such as Down’s and Turner’s. A point 
which strikes me as highly convincing is that children often do not resemble their 
parents or their siblings in intelligence. Resemblances between them (amounting to 
an average correlation of 0:5) could plausibly be attributed to the effects of common 
upbringing. But the fact that professional parents can have quite dull children, and 
lower-class, poorly educated parents have very bright children, would be expected on 
genetic though not on environmental theory. 


Let us now look at some of the more striking studies of environmental effects. 
Some remarkable investigations in recent years by Trevarthen (1974) at Edinburgh, 
also Schaffer (1977), Bower (1974), and Jerome Bruner (1975), have brought out the 
importance of mother-infant interactions in promoting cognitive development. Also 
longitudinal follow-up investigations like the Berkeley Growth Study, and the work 
at the Fels Institute, have yielded substantial correlations between certain types of 
parental handling of children, and their later intelligence. However, such results are 
difficult to interpret, since they might also be explicable genetically. І have mentioned 
already the substantial genetic-environmental covariance effect. In addition, children 
with superior genes tend to exploit and even shape their own environments more 
effectively than dull ones. They are more interested in reading, ask more questions, 
and explore more actively. In other words, the genes may affect the environment 
rather than the environment affecting the intelligence. Hence the evidence from 
intervention studies is more convincing, where children are submitted to specially 
stimulating or deprived environments, and can be compared for intelligence with 
control groups. 


Several cases of severe deprivation have been described in the literature, that is 
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children who grew up in a highly restricted environment and had hardly any contacts 
with other human beings. As would be expected from interactionist theory, their 
mental development scarcely went beyond the imbecile level. Nevertheless Kolu- 
chova's (1972) pair of twin boys, and other cases, who were rescued and put into good 
foster homes as late as 7 years of age overcame their retardation. Their IQs rose from 
40 or below to 100 or above. Thus a highly unfavourable environment in early 
childhood does not necessarily bring about irremediable damage. 


In the late 1930s Skeels tested 24 orphans in a very unstimulating institution, at 
ages 7 to 30 months. Thirteen of them were then transferred to a hospital where they 
received a lot of care and individual attention from mentally defective girls, and most 
of them were later fostered out into lower middle-class homes. Skeels claimed a large 
improvement averaging 274 IQ points in these transferred cases, whereas in the other 
11, left in the original institution, there was a further drop of 26 points. Some 25 years 
later Skeels (1966) traced all these cases, and found the transferred ones to be normal, 
self-supporting adults, holding quite a range of skilled jobs, or else they were married 
women. The non-transferred were still either institutionalised, or in very low-grade 
jobs. The average number of years of schooling of the two groups were 11-7 and 4:0 
years respectively. Though I would not put any credence in intelligence tests given at 
such an early age, it seems reasonable to conclude that, as adults, the transferred cases 
averaged at least 30 IQ points higher than the others. 


Probably the best controlled study is that of Heber and Garber in Milwaukee (cf. 
Garber and Heber, 1977), though insufficient details have so far been published for us 
to evaluate the findings. Forty negro boys were selected at birth whose mothers 
scored 80 IQ or below, and who lived in a very poor neighbourhood. Twenty of them 
е experimental group—attended a centre for seven hours a day, five days а week, 
and underwent an all-out programme devised to improve their sensori-motor, language 
and thinking skills. Simultaneously their mothers were given an educational pro- 
gramme including home-making, child care, and vocational training. The other 20 
were brought up at home, but took the same periodic tests as the experimental group. 
The results are summarised in Figure 1. Up to about the age of 14 months the two 
groups remained closely parallel on the Gesell scale, but the control group began to 
fall behind after 18 months. On pre-school scales given between 2 and 44 years, 
Heber found mean IQs of 122.6 and 95:2, that is a superiority of 27-4 points among 
the experimental group. Up to the age of 6 the Experimental means stayed between 
110 and 120, whereas the Controls dropped to around 85. The special programme 
ceased when the children entered first grade. By ages 8 to 9, the Experimentals 
dropped to an average of 104, while the Controls now averaged 80. Final figures are 
not yet available and, in the absence of further stimulation, it is possible that the 
Experimentals may show some further decline. But clearly they have a tremendous 
advantage over the Controls reared in their own homes. 


These results are all the more striking when compared with the virtual failure of 
the well-known Head Start experiment in the 1960s to produce any permanent gains. 
However, the Head Start programmes usually amounted only to a few months 
attendance at a nursery school for a few hours a day, before entering elementary 
School. Since then, numerous intervention studies have been based more on helping 
the mothers to stimulate and interact better with their 1- to 4-year-old children, by 
means of home visits from psychologists or specially trained teachers. Levenstein's 
(1970), Karnes and Teska's (1970), Bronfenbrenner's (1974), and other programmes 
have produced gains of some 13 to 20 IQ points, though there is not much evidence 
yet of the permanence of the effects. Another advantage of such schemes is that they 
are relatively inexpensive, probably costing less than Head Start, whereas the Mil- 
waukee experiment was far too expensive ever to be applied to large numbers of 
children. 
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FIGURE 1 
IQs oF EXPERIMENTAL AND CONTROL GROUPS IN THE STUDY BY GARBER AND HEBER (1977) 
(Nore.—The last two points at 7 and 9 years are approximate only, being based on a lecture, not a 
published article) , 
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We might expect even more effective intervention if black, or American Indian, 
children, for example, were adopted shortly after birth by white parents, and reared 
in a superior white environment. Several reports have appeared suggesting that such 
children grow up to be as intelligent as white children in their own families, but usu- 
ally there is no control over selective placement, and the true parents could well have 
been of above average intelligence (see Loehlin e£ al., 1975). The most thorough study 
is that of Scarr and Weinberg (1976) of 99 negro or part-negro children whose true 
parents were of about average education for the area. Thus the children, if reared in 
their own homes, would have obtained a mean IQ of about 90. Twenty-nine children 
were known to have two black parents, and when tested after the age of 4 in the foster 
homes, their mean was 97. Sixty-eight others with one black, one white, parent 
averaged 109. But white children of the adoptive parents averaged 118. Apparently 
then the improved environment does produce a substantial gain, though the white- 
black mean difference is certainly not wiped out. Thus more data of the same kind are 
urgently needed. I should add that I am not taking any stand myself on the issue of 
racial differences in intelligence, since it is much more difficult than it is with individual 
differences to assess separately the genetic and environmental components. I see no 
reason why there should not be some innate psychological characteristics, just as there 
are innate physical differences, but I would expect them to be small in comparison with 
the very large cultural and environmental differences. Also, minority group children 
are much more susceptible to pre- and perinatal handicap, and to malnutrition, than 
white children, and this might explain part of the difference. Unfortunately no one has 
yet been able to define the precise environmental factors which particularly handicap 
minority group children, hence environmentalist theories are almost entirely specu- 
lative. As Urbach (1974) remarked: “ Everything in the world can be explained by 
factors that we know nothing about ”. 


Although the total evidence of environmental effects on individual intellectual 
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growth is very strong, the amount of improvement in IQ brought about by fostering 
or intervention does not exceed what would be expected from the figure of 20 to 30 
per cent environmental variance that I suggested earlier. True, there are occasional 
cases like Koluchova's twins who register far greater changes, but they are so abnormal 
that they can hardly be said to invalidate the heritability analyses. 


WHAT OF THE FUTURE? 


Lastly, let us ask what of the future? I shall be neither surprised, nor sorry, if 
group tests of children's intelligence disappear, particularly within elementary or 
primary schools. There is likely to be much less criticism by educationists and parents 
of instruments called Verbal or Non-verbal Reasoning tests—that is the name adopted 
for the Moray House series many years ago. Likewise the Scholastic Aptitude tests 
of the American College Entrance Board, or so-called factor tests such as verbal, 
reasoning, spatial, etc., are more acceptable, and measure very much the same thing. 
They can be useful in rough sorting of secondary school students by ability, or for 
admission to American universities, and for some occupational selection as in the 
armed services. But we must discourage the notion that they measure a single, global, 
innate intelligence, or that they predict by themselves the educational or occupational 
potential of the testees. They should be regarded as only one piece of evidence whose 
validity for particular purposes has been demonstrated. That is, they should be 
combined with evidence from other sources such as scholastic or work record, inter- 
view judgments, or special aptitude tests such as mechanical, clerical, and so оп. Also 
their low reliability for long-term predictions should be borne in mind. Where the 
test content or format appear unfair to minority groups such as American blacks, or 
coloured immigrants to Britain, their validity and regression coefficients should be 
specially investigated. Group tests will also continue to be useful as control variables 
in educational or psychological researches in assessing the representativeness of a 
population sample, or for matching contrasted samples. Since we are concerned with 
group characteristics in this context, the limited efficiency of the tests for individual 
diagnosis is less of a drawback. 


Next, I do not foresee any serious problem in using individual tests like WAIS 
with adult clinical patients, though it would be useful to have other more diagnostically 
valid tests like the Reitan battery to supplement them. But I am more dubious about 
Stanford-Binet and WISC or WPPSI for children, since these are so closely associated 
with the concept of innate potential. Besides the lay critics, a great many psychologists 
seem to think they have outlived their usefulness. The view is commonly expressed 
that they give an over-static picture of the developing child; they merely measure end- 
products, rather than psychological processes, and thus throw no light on how the 
retarded or maladjusted child reached that state, and how he might grow out of it. 
However, the available alternatives which are supposed to tell us more about pro- 
cesses seem to consist mainly of Piagetian tasks, ITPA, and so on, whose shortcomings 
I mentioned earlier. Moreover, the well-trained Binet or Wechsler tester does find out 
quite a lot about processes from his qualitative observations of the child’s ways of 
tackling items and the kinds of errors he makes. True, these judgments are subjective, 
and I would entirely agree that there is room for other supplementary tests to tell us 
more about children’s cognitive styles and strategies, and specific learning disabilities, 
if someone would invent them. Those psychologists who simply condemn all intel- 
ligence testing could do a constructive and useful job by producing some better 
diagnostic tools. 


I do not see any particular virtue in retaining the term intelligence as such, except 
that the general factor which runs through a wide variety of cognitive skills is too large 
to be ignored, and must be called something. And it does undoubtedly account for a 
great deal of the variance in cleverness vs. dullness of children, either in everyday life, 
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or at school. But it might be preferable to use a series of factor tests, like the 
McCarthy scales (1972), which yield Indices for Verbal, Perceptual-Performance, 
Quantitative, Memory and Psychomotor abilities. The first three of these are com- 
bined to give a General Cognitive Index. These are limited to the 24 to 84 year range. 
The new British Ability Scale (Elliott, 1974) covers from 24 to 17 years, but it includes 
over 20 separately standardised tests, from which the tester can select those thought 
relevant to the particular child. The combined score on four designated tests yields a 
conventional IQ. So far there seems to be no evidence whether school or clinical 
psychologists regard either of these modern tests as preferable instruments to the 
Binet or WISC. 


Several writers like Robert Glaser (1977) and Benjamin Bloom (1976) take the 
view that IQ tests are predictive of achievement only in a monolithic educational 
system. If we could provide more adaptive or individualised programmes to suit each 
particular child, the correlation would be much reduced, and we could make better 
predictions by means of a series of criterion-referenced tests, to show just what stage 
a child has reached in each subject, and what he is ready to go on to next. Their work 
is impressive, but it seems to me that its applicability is limited mainly to arithmetic, 
perhaps natural sciences, and early stages in English. Also I suspect that there will 
still be wide individual differences in overall rate of progress, and that refusal to 
measure this general factor would amount to throwing away the baby with the bath 
water. While it is true that both intelligence and educational achievement tests depend 
partly on genetic factors, partly on home and school environment, it is not true that 
they both measure the same thing. Many investigations have shown that the heri- 
tability of measured intelligence is a good deal higher than that of achievement. 
Intelligence refers to the general reasoning and other cognitive capacities which are 
developed largely by stimulation received in the home and in leisure hours or peer- 
group activities, whereas achievement refers to the more specialised performance in 
school subjects which depends greatly on the quality of teaching and on children’s 
motivation to learn. Nevertheless, the intelligence factor gives us useful educational 
predictions in so far as children may usually be expected to be able to apply the 
reasoning capacities built up outside school to tackling any new topic in school. 


This is very different from the belief that IQ tests measure innate ability, and 
educational tests measure acquired knowledge—a belief which is still far too com- 
monly held by many teachers, and even by some educational psychologists. But if we 
accept, as J have been arguing, that genetical and environmental influences are both 
very much involved in the development of human intelligence, then it may be con- 
cluded that our tests, regardless of whether they are called intelligence tests or 
something different, will continue to be of immense value in diagnosing the strengths 
and weaknesses both of backward, and of highly gifted, children. 


ACKNOWLEDGMENT.—Some paragraphs in this paper have been quoted from a forth- 
coming book: Intelligence: Heredity and Environment, to be published by W. H. Freeman. 


ADDENDUM 


My rejection, in this article, of fraudulence by Professor Burt was justified by the published 
evidence available at the time of writing. But only three months later I was informed that Professor 
L. S. Hearnshaw was about to publish a biography, entitled: Cyril Burt: Psychologist (Hodder and 
Stoughton, 1979). Hearnshaw had access to Burt's personal diaries, and I understand that Burt 
actually described faking some of the data on which his later articles were based. This was most 
evident in his work on identical twins, written when he was aged 70 to 88 years; but I understand 
that there are indications of deception even in work which appeared in the late 1940s. This is a 
tragic blow to those of us who admired him, and doubtless a welcome outcome to his detractors. I 
would point out, however, that rather little of what I wrote in the paragraph about Burt actually 
requires to be revised. 
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A COMPARATIVE STUDY OF YOUNG CHILDREN'S 
CLASSROOM ACTIVITIES AND LEARNING 
OUTCOMES 


By MARY ANN EVANS 
(Department of Psychology, University of Waterloo, Ontario) 


Summary. Ten ‘informal’ and ten ‘ comparison’ primary grade classrooms matched 
according to socio-economic neighbourhood, grade, and instructional organisation were 
Observed using the Pupil Activity Scan, an observation instrument specifically designed 
for the present study to record pupil behaviour. Analysis of the observational data 
indicated that children in informal classes engaged in child-to-child interaction, play 
activities, individual conferences with the teacher, and independent work to a greater 
extent than comparison pupils. In the comparison classrooms, word analysis activities, 
printing activities, independent silent reading, and teacher-led group experiences were 
more prevalent. Substantial variation in curricula was also observed across all class- 
rooms. ‘Testing of the children at the end of the school year indicated no differences 
between the two groups in language development, problem-solving, fine motor-figural 
perception, role-taking, or understanding of classification, but the performance of pupils 
in the informal classrooms appeared to be lower with respect to reading and mathematics. 


INTRODUCTION 


IN recent years, increasing interest has been shown by Canadian and American 
educators in a type of primary education variously referred to as ‘ informal’, * play- 
based', ‘ Piagetian-derived’, ‘ cognitive’, ‘ child-centred’, ‘open’, and ‘ British 
Infant’. In spite of the willingness with which such a model (loose as it may be) has 
been adopted, and the strong commitment of its supporters, relatively little research 
has been published objectively describing the forms it takes (Brandt, 1972) or validating 
its supporters’ claims and goals (Walberg and Thomas, 1974; Brainerd, 1978). This 
point becomes more of a concern when considered at a local level where alternatives 
to traditional education may be adopted with much eagerness, rhetoric and good 
intention but with little objectivity in the form of operational programme descriptions 
or ongoing monitoring and evaluation. When the time comes for an evaluation, it is 
the responsibility of the applied researcher to fill this gap on demand while at the same 
time providing material of interest to a wider research community. 


The present paper presents the study of such an approach to early SU ROO 
education initiated in some 50 kindergarten through grade two classrooms wi 
single school district in Ontario. Principals interested in the project volunteered their 
schools as settings for initiating the project. According to the descriptions provided, 
classes involved were to offer a ‘ child-centred ’, * play-based" programme in which 
children engaged in ‘ self-selected >“ self-directed ” activities, in which reading instruc- 
tion was to be based on ‘ the child's own language ° through self-composed stories and 
personal word banks, in which pupils would have * many opportunities to develop 
oral language competence ° through interactions with adults and peers, and in which 

* meaningful learning ’ was to be rooted in ‘ concrete materials ’ and * real experience 7. 
To encourage implementing these ideas within the classrooms, teachers designated as 
participants in the project received specialised group in-service training, individual 
assistance from consultants to the project, and specific equipment and instructional 
materials. 

Initially, as with many evaluation studies, the concern was with the potential 
effects the programme might have on the pupils. However, as pointed out by Wallen 
and Travers (1963) with respect to comparative studies of progressive education in the 
1930s, and Findlater and Rubin (1977) with respect to research in the 1960s, a major 
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weakness of past studies has been in focusing on outcome differences of programmes 
(or the lack thereof) to the exclusion of examining what happens within these pro- 
grammes. Given the varying extents to which a programme may be implemented 
(see, for example, Bissell, 1971), it seems highly questionable to compare pupil 
performance between programmes only reputed to be different without knowing 
whether curriculum differences, in fact, exist and what those differences are. 


With these considerations in mind, the present research proceeded in two parts. 
Part I was an observational study of these classrooms (labelled herein as * informal °) 
and other 'comparison' classrooms to determine any differences between the 
approaches and to examine curriculum variations across classrooms; Part II was a 
comparison of pupil performance between the two groups of classrooms in a variety 
of skill areas to assess the effects of the * informal’ approach. 


METHODS: PART I 

Sample 

Ten primary grade classrooms formally associated with the project were selected 
as sample classrooms where the philosophy, procedures and activities advocated had 
been directly initiated through the project's consultants. Seven of these classes 
consisted of first-grade pupils; the remaining three formed heterogeneous groupings 
of both first- and second-grade pupils. Asa‘ comparison’ to this group of * informal" 
classes, ten other classrooms were selected. Each classroom was matched to its pair 
member in terms of socio-economic neighbourhood, physical and instructional 
organisation (open-concept team approach or self-contained classroom) and grade 
structure (first-grade or heterogeneous grouping). 


It must be emphasised that the comparison classrooms were in no way selected 
to represent curricula of an opposite or contrasting nature to that entailed in the 
project. Rather, they were simply other classrooms, providing a sample of classroom 
activities normally occurring without the active intervention of project consultants. 
Indeed, since the activities and techniques of the project had been publicised through 
in-services open to all teachers, it was conceivable that some activities associated with 
the project would be found within the comparison group. 


Observation instrument 

For this study, an observation instrument, the Pupil Activity Scan, was specifically 
designed. We had found existing observation procedures (e.g. Withall, 1949; Wright 
and Nuthall, 1970; Flanders, 1970) to be workable in fairly formal structured 
situations, but unsatisfactory in activity-type classrooms where many private and 
spontaneous conversations occur between teacher and pupil and among pupils them- 
selves (see also Walker and Adelman, 1975). More importantly, our focus was not on 
the teacher—her verbal behaviour, personality, influence, or contact—but rather on 
the children and their learning activities during the school day. 


The observation instrument developed entailed a time-sampling procedure by 
which 10-second samples of a pupil's behaviour were repeatedly taken across half-day 
observation sessions. For each of these brief intervals, a child's activity was observed, 
coded and recorded with respect to its theme or general nature, its course of direction, 
the pattern of verbal interaction, the pupil's involvement, and the materials actually 
used. A number of categories within each of these dimensions detailed the activities 
in more precise descriptive terms. In addition, the general teaching-learning context 
in which the child was found was noted as ‘ teacher-led group work’ in which the 
teacher actively led the activities of at least two pupils, or * independent work ' in 
- which the children worked without immediate supervision or on an individual basis 
with the teacher. (Interested readers may obtain a sample recording form and 
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definition of the categories from the author on request.) As an illustration, a boy 
playing fire engine with three companions would be coded in the appropriate cell of 
the observation record as ‘fantasy play’, ‘ student leads self’, ‘ child-child inter- 
action °, ‘ active involvement ’, ‘ fantasy props’, and ‘ independent work ’. 


Procedure and data collection 

Three observers were trained to assist the investigator in collecting the classroom 
data to a criterion of at Jeast 80 per cent inter-observer agreement per category. Any 
category not reaching this criterion was dropped from the data collection. Since we 
were unsure about all the types of materials used in first-grade classrooms, observers 
simply wrote down the materials used for later categorisation. 


Each classroom was observed for two mornings and two afternoons on four days 
over a span of four weeks. Except for the case of illness, a single observer collected 
the data for a particular classroom. Although this procedure introduces the possibility 
of systematic observer error with respect to a classroom, it more likely provides data 
truly reflective of the classroom, given the extent to which class and observer become 
comfortable and familiar with each other. 


Over the course of the observations, approximately 50 samples of each child's 
behaviour were taken. The data were collapsed across pupils and the four half-day 
observation sessions to yield classroom totals for each category separately within the 
independent and teacher-led work learning contexts. These category totals were then 
converted to percentages for each of the two contexts. A third score for each category 
representing an “ average’ whole day was also formed by combining the percentages 
for the two contexts. Thus a summary profile for each classroom was formed 
describing the overall activities of the children within it, such as the percentage of 
activities involving story writing, the percentage of activities in which pupils were 
passively involved, the percentage of activities in which pupils conversed with each 
other, and so on during independent work, teaching-led group work, and across an 
* average ' whole day. 


RESULTS: PART I 


To determine differences in classroom activities between the two groups, one-way 
analyses of variance using the SPSS computer programmes (Nie et al., 1975) were 
carried out. Any category in which the overall mean for both groups was less than 


1 per cent of all categories within a dimension for a context was not included in the 
analysis. 


Independent work 

Table 1 contrasts the mean percentages for the two groups for observation 
categories in the independent work context. A number of significant group effects 
were found. As a textual illustration of the data in Table 1, it can be seen that a 
greater percentage of comparison pupils' activities were self-directed and involved 
silent reading or the acquisition of word analysis skills. In contrast, a greater per- 
centage of the activities of pupils in informal classrooms involved interacting on a 
one-to-one basis with the teacher, interacting with each other, orally reading, or 
exploratory play such as that found in sandboxes and water tables. Analysis of the 
materials categories (these category labels being detailed later in Table 3) revealed a 
greater use in the informal classrooms of large stacking objects (Е = 11:44; df «1,14; 
P<0-01), sandbox and water play areas (F=9-51; df=1,14; P<0-01) and fantasy 
props (F=16-03; df=1,14; P<0-001). A greater percentage of activities in the 
informal classes also involved the children reading or writing their own compositions . 
(Е =11:20; df=1,14; P<0-01). Overall, greater variety in both materials and: 
activities was found in the informal classes (P « 0-01). 
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TABLE 1 


INDEPENDENT WORK: MEAN CATEGORY PERCENTAGES WITHIN THE DIMENSIONS OF SOURCE OF 
DIRECTION, VERBAL INTERACTION, INVOLVEMENT, AND THEME OF ACTIVITY 














Informal Comparison F-Value 
Category Mean sD Mean sD df=1, 18 
Source of Direction 
Teacher works with individual 
pupi 37 5773 3:80 21 7-46* 
Pupil leads other pupils 1:98 1:94 2:03 172 0:00 
Pupil leads self 88:64 7:52 94-11 3:54 433% 
Verbal Interaction 
Child-teacher interaction 9:72 5:62 3:96 2:08 9:25** 
Child-child interaction 42:01 7:25 26:79 9:39 15.73*** 
No interaction 48:01 9:52 68°77 10:02 20-16*** 
Involvement 
Active pupil involvement 88:83 3:98 90:25 431 0-59 
Passive pupil involvement 6:86 2:56 5:43 292 1:35 
Little pupil involvement 416 3:40 429 3:01 0-01 
Theme 
Silent reading 2:97 414 T 89 3:45 8:32** 
Oral reading 4:04 177 0:92 1:14 21:89*** 
Word practice 5:61 401 6:31 447 0:14 
Word analysis 1:84 1:54 11:94 7-47 17:53%%% 
Contextual meaning 1:03 2:18 2:23 2:32 0:25 
Story writing 3:29 2:61 1:66 190 2:55 
Printing T55 491 12:94 6:14 4-70* 
Mathematics 13:56 7:34 9:45 6:87 1:67 
Fine motor/perceptual skills 9:33 5:20 9:01 612 0:02 
Creative arts 2015 4:89 21:17 1060 0:08 
Fantasy role play 8:68 422 451 668 2:78 
Exploratory/repetitive play 3°59 1:49 115 1:37 i450*** 
Prepare/finish off 1:67 1:60 1:33 1:85 0:19 
Positive affect 2:66 2-73 0:46 0:77 601% 
Unspecific behaviour 1-69 1°62 1:30 2:23 0:20 
Number of different theme 
categories across observation 18720 215 15:10 2°81 7-69%% 
* P<0°05 ** P«0-01 *** P<0-001 


Teacher-led group work 

Within the teacher-led group learning context only one significant difference was 
found: a greater percentage of activities directed towards word analysis or phonics 
skills was observed in the comparison (13:38 per cent) than informal (5-37 per cent) 
classes (Е =17:53; df=1,18; P<0-01). No significant differences were found on the 
theme categories oral language arts, story-telling, music, or any of the other categories 
listed for independent activities in Table 1, or materials categories listed in Table 3. 


Whole day 

Category percentages for an ‘ average’ whole day depended both on the extent 
to which the category was checked within a particular context (i.e. teacher-led group 
ys. independent work) and the extent to which that teacher-learning context was 
observed in the classroom. For example, story-telling which might occupy a large 
percentage of activities within teacher-led group work would not occupy a large 
percentage for the whole day if the teacher-Jed group context itself was little used. In 
fact, a major difference between the two groups was found in the extent to which each 
of these two contexts was employed: teacher-led group activities occupied an average 
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of only 36:41 per cent of the school day in the informal classes compared with 57:31 
per cent in the comparison classes (Е = 15:39; df= 1,18; P<0-001). 


In order to summarise activities for each classroom across an average whole 
day (Tables 2 and 3), category percentages for each of the two contexts were prorated 
according to the extent of those contexts and then combined. Analysis of the data 
revealed that in keeping with the greater amount of teacher-led group work in 
comparison classes, a greater percentage of pupil activities was under the immediate 
direction of the teacher in the comparison classes than in the informal classes (F =9-60; 
df=1,18; P<0-01). Similarly, a greater percentage of comparison class activities 
involved passive pupil involvement (Е =6-71; df=1,18; P «0-05), that is, activities in 


TABLE 2 


У’ногв Day: MEAN CATEGORY PERCENTAGES WITHIN THE DIMENSIONS OF SOURCE OF 
DIRECTION, VERBAL INTERACTION, AND INVOLVEMENT 











Informal Comparison F-Value 
Category Mean SD Mean SD df=1,18 
Source of Direction 
Teacher works with pupil 
(individual and group) 41°66 12:27 58:02 11:33 9-60** 
Pupil leads other pupils 1:53 121 1:60 1:45 0-01 
Pupil leads self 56:79 12:60 40:55 11:36 9:16** 
Verbal Interactiont 
Child-teacher interaction 6:04 3:19 1:69 1:04 1681*** 
Child-child interaction 27°50 8°86 12°50 715 1573 
No interaction 3021 6:13 28:49 5:20 0-41 
Involvement 
Active pupil involvement 76:77 4:89 70-43 6°46 5:89** 
Passive pupil involvement 19:83 5:86 27-08 6:34 671** 
Little pupil involvement 3:44 3:03 2:44 1:96 0:77 





T These categories represent interaction patterns within independent/one-to-one work prorated 
to indicate percentages for these patterns considering the whole day. Hence they total to 63:75 per 
cent and 42:68 per cent for the informal and comparison group respectively. 


** P<0-01 жжж P<0-001 


which the child attended to the responses or activities of others rather than playing a 
central role himself at the point of observation. In contrast, children in the informal 
classrooms were to a greater extent actively involved (F =5-89; df = 1,18; P « 0-05) and 
self-directed (Е —9-16; df —1,18; P « 0-01) in their activities. A greater percentage of 
their activities than comparison pupils' activities also involved private conversations 
with the teacher (Е — 16-81; df=1,18; Р<0.001) and conversations with each other 
(Е «15-73; df 41,18; P<0-001). 


Mean whole day percentages for the remaining theme and materials categories 
for the two groups are contrasted in Table 3. Неге it can be seen that across the 
whole day a greater percentage of activities within the comparison classrooms dealt 
with word analysis or phonics (P<0-001) and the meaning of textual material 
(P « 0:05) while a greater percentage of those in the informal classrooms consisted of 
exploratory or repetitive play (P «0-001). In addition there was a tendency towards a 
greater percentage of creative arts activities (P « 0-10) and fantasy play activities 
(P «0-10) in the informal classrooms. Across the whole day, the extent to which 
certain types of materials were used also significantly differed. More activities in the 
comparison classrooms involved objects for display purposes or illustration (P « 0-05), 
while a greater variety of materials was found in the informal classes (P<0-01) 
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TABLE 3 


Wuo te DAy: MEAN CATEGORY PERCENTAGES WITHIN THE DIMENSIONS OF 
‘THEME OF ACTIVITY AND MATERIALS USED 














Informal Comparison F-Value 
Category Mean SD Mean SD df —1,18 
Theme 
Silent reading 441 5:37 7:48 2:79 2:57 
Oral reading 480 2:92 3:02 1:97 2:56 
Word practice 6:55 511 5:98 3:14 0:77 
Word analysis 3:29 271 12:29 5:65 20-64*** 
Contextual meaning 2:87 3:28 5:21 1:55 416% 
Story writing 2:29 2:02 1:60 2:13 0:55 
Printing 6:15 3°65 7:07 2:36 0-45 
Oral language arts 7:38 4:05 10:66 5:03 2:58 
Mathematics 11:93 5:92 9:15 5:04 1:28 
Fine motor/perceptual skills 5:82 3:57 441 4:00 0:69 
Creative arts 12°80 437 9:25 3:67 3:87 
Fantasy/role play 5:83 423 2:82 3:34 2°78 
Exploratory play 2:35 137 0:52 0:54 1547** 
Music 5:85 413 6:50 521 0:10 
Story telling 426 3:67 6°49 5:38 1:17 
Unspecific behaviour 121 1:22 1:55 2:28 0:17 
Number of different theme 
categories 21:00 1:49 19°60 1778 3:64 
df=1,14 
Materials 
Desk arts and crafts 11:01 5:35 11:90 544 O11 
Non-desk arts and crafts 401 2:90 2:68 1:66 1:28 
Small locking/stacking 4:39 2:84 310 3:81 0:59 
Large stacking 1:95 131 0:13 0°18 15:32** 
Visual matching 0:89 0:94 0:90 1:35 0:00 
Sandbox/water table 1:98 1:52 0:30 0:60 8:40** 
Fantasy props 5-73 4:26 071 1-13 9:93** 
Pupil-produced text 3:64 174 1:91 183 3°73 
Basal reader 5:45 6:10 611 8:29 0:05 
Other text 11:69 6:86 12:18 8:65 0-02 
Personal word bank 3°73 3:12 1:03 2:62 3:51 
Display 8:88 6:63 19:03 8:45 715% 
Paper and pencil 451 2:09 7:66 5:29 2:46 
Duplicated papers 11:84 6:54 12°75 8:00 0:06 
Audio-visual aids 0:99 0:99 3:09 3:60 2:53 
Number of different materials 
across observations 16:63 1:06 14:13 1:73 ]12:17** 
ж P005 ** P001 *** P«0-001 


Nore.-—-It was felt the one observer's notations on the materials used were sometimes too 
unspecific for reliable coding. Hence, data from 16 rather than 20 classes were analysed for the 
materials categories. 


including greater use of large stacking objects (P<0-002), sandbox/water tables 
(P « 0-01), fantasy props (P « 0-01), pupil-produced texts (P « 0-10) and personal word 
banks and listings (P < 0:10). 


SUMMARY: PARTI 
The results strongly supported the implicit assumption underlying any com- 
parative programme evaluation—in this case, that the classrooms associated with the 
programme to be evaluated were offering a curriculum substantially different from that 
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otherwise being offered. In spite of the variation between classrooms within each 
group (see standard deviations in the tables) the significant differences obtained 
demonstrated such activities as play, individual pupil-teacher conferences, independent 
work, pupil-to-pupil talk, and story-writing to be in greater evidence in the project 
classrooms. These differences were all very much in line with the project’s stated 
objectives, tenets and suggested procedures: classrooms that we initially labelled 
informal did appear to be just that. Having quantitatively and objectively established 
these differences, we felt confident in proceeding with a comparative study to determine 
possible effects of the informal classrooms’ approach to primary education. 


METHODS: PART II 


Sample 

The same 20 classrooms that had earlier been observed were involved in the 
second part of the study. As a check to ensure that each of the classrooms initially 
designated as informal was more like other members of that group than comparison 
classes, and vice versa, a discriminant analysis (Nie et al., 1975) was applied to the 
data of Part I. The discriminant function formed, using the variables teacher-led 
group work, whole day word analysis activities, individual pupil-teacher conferences, 
child-to-child interaction, oral reading during independent work, and silent reading 
during independent work, classified the 20 classrooms into the two groups with 
complete accuracy. Thus each classroom was viewed in practice as an appropriate 
representative of either the informal or comparison curricula. 


Measures 

Goodlad (1969) has warned that in evaluating educational innovations “ the 
researcher simply cannot go in with his stable research—his conventional criteria, his 
timeworn measures—and expect to contribute to the advancement of educational 
practice and science ” (р. 105). Hence, children within the 20 classrooms were tested 
in a variety of skill areas thought to be possibly enhanced by the informal curricula 
being implemented. The standard tests were the Coloured Progressive Matrices 
(Raven, 1956) as a measure of non-verbal problem-solving ability, and three subtests 
of the Stanford Achievement Battery—Reading Comprehension (employing a 
multiple-choice ‘cloze? format), Mathematics Concepts, and Mathematics Com- 
putations and Applications. Raw scores for each of these measures were used. 


Fine motor and figural perception skills were assessed by the Geometric Retention 
'Test developed by Cash (1976). In this test children are presented with a set of ten 
geometric designs, each design being seen for a brief interval and then drawn from 
memory. Both raw scores, and converted scores taking age into account representing 
the children's IQs, were derived from their drawings. 


A set of classification tasks used by Kofsky (1966) assessed the children's under- 
standing of classification operations outlined by Inhelder and Piaget (1958). Role- 
taking ability was also assessed using a procedure developed by Flavell et al. (1968). 
In this task, a child sees a set of seven pictures portraying a fierce dog chasing a boy 
who climbs an apple tree and then eats an apple. The child narrates the story, the 
pictures relating to the fierce dog are removed, and the child narrates the story again 
from the perspective of a second person who sees only the remaining pictures. This 
second narrative is rated according to the shift in motivation attributed to the boy for 
climbing the tree. 


In addition we attempted to measure language development by collecting speech 
samples relating to the interpretation of pictures, explaining of games and spontaneous 
story-telling. These speech samples were transcribed from audio-tape and received a 
score for the total number of words, mean length of utterance, and Developmental 
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Sentence Level (Lee and Canter, 1971). Unfortunately, some of the audio-tapes were 
not clear enough to be transcribed reliably and our data base for these measures was 
reduced to 10 first grade and four heterogeneous classes. 


As additional measures of reading ability, an individual informal reading test was 
given requiring the pupils to read a primer, grade one and grade two passage and 
orally to answer questions relating to each passage. Since the number of questions 
varied, raw scores for these passages were converted to scores out of 100. In addition, 
an estimation of the pupils' instructional and independent reading level was made 
on the basis of these scores. Finally, teachers were asked independently to rate the 
children's overall reading level on the basis of their classroom performance. 


Procedure 

Children in the 20 classrooms were tested at the end of the school year. Any child 
who had been admitted to a classroom after January Ist was not included in the test 
data, thereby ensuring that all children would have maximally experienced that 
classroom's curriculum and many of the learning activities earlier described in Part I. 
In addition, serious concerns were expressed over administering reading tests to 
children known to be only at a reading readiness level. 'To meet this concern, children 
whom the classroom teacher identified in advance to be at such a level were exempted 
from all testing. An average of 12 per cent of the pupils were thereby exempted, with 
no statistical difference being found between the two groups of classrooms. 


Group tests were administered to the whole class (with the above exceptions) and 
individual tests given to a random sample of at least eight children per class (between 
30 per cent and 70 per cent of the number of eligible pupils in a class). Tests were 
administered in the same order in all classrooms. While it would have been preferable 
also to collect co-variate data, it was impossible to do so before the children had 
experienced either of the two curricula, and no consistent information on the children 
had previously been collected. However, there was no reason to suspect different 
distributions of pre-requisite skills in the two groups, given that the schools had been 
matched for socio-economic neighbourhood. 


RESULTS: PART Ц 


In the case of interval data, one-way hierarchical analyses of variance were 
performed on the test results of the grade one children using the BMDP computer 
programmes (Dixon, 1975). Data obtained from the heterogeneously grouped classes 
were not included since the scores of the second-grade children would have introduced 
extreme within-group variance. Table 4 displays the group means, group standard 
deviations, and range of classroom mean scores. 


The hierarchical ANOVA’s revealed no significant differences between the groups 
in any of the data relating to visual-motor skill, IQ, non-verbal problem solving, or 
classification ability, the first four listings in Table 4. Мо significant differences 
between classrooms within the two groups were found on these same measures: 
variation across classes was roughly equal to that across individual children. Had 
there been any trend towards significance at all in the converted IQ scores, one would 
E wary of proceeding with any further data anlysis. Fortunately, such was 
not the case. 


Similarly, no significant effects were found for either groups or classes within 
groups in the data for the Grade One Passage or Grade Two Passage. However, for 
the remaining scores in reading and mathematics, significant F-ratios resulted. The 
analysis is summarised in Table 5. Неге, the effect for group was significant: the 
informal group mean was significantly lower than the comparison group mean on 
the Primer Passage and each of the standardised achievement subtests. However, the 
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TABLE 4 


MEAN PERFORMANCE ON TEST MEASURES IN First-GRADE INFORMAL AND 
COMPARISON CLASSROOMS 





Test Measure Group Mean sD Range 

GRT Raw Score Informal 95-24 3°66 89:40- 98:86 

Comparison 92°48 3:84 89:57- 98:64 

GRT Converted Score Informal 11638 4°63 108:20-121:89 

Comparison 112.73 5:87 104:57-121:80 

Progressive Coloured Informal 19:51 1:97 16:50- 22:88 

Matrices Comparison 19:64 1:49 17:38- 21:13 

Classification Task Informal 13:97 108 . 12:40- 15:63 

Comparison 13°52 0:53 12775- 1422 

Primer Passage Informal 59:10 11:97 37:50- 71°45 

Comparison 6737 9°72 56:00- 82:50 

Grade One Passage Informal 48:99 10:02 34-75- 64:13 

Comparison 55:38 5:07 48:64— 62:30 

Grade Two Passage Informal 37-09 9:68 23:13- 47:00 

Comparison 39°08 6:94 25:50- 46°50 

Stanford Reading Informal 23:98 6:72 10:73- 2981 

Comprehension Comparison 27:42 3°61 22:25- 32:00 

Stanford Mathematics Informal 18:36 1:76 16°31— 21:29 

Concepts Comparison 20:26 1:69 18:57- 23:33 

Stanford Mathematics Informal 20:36 416 1447- 25°78 

Comprehension and Comparison 21:91 1:97 19:71- 2411 
Applications 

Number of words Informal 295-58 35-61 246:88-336:29 

Comparison 301:31 45°71 260:50-353:33 

Mean length of utterance Informal 5774 0:78 497- 675 

Comparison 5:84 0:53 504- 646 

Developmental Sentence Informal 824 0:79 T30- 9:29 

бсоге Comparison 8:45 0°68 T61- 934 


effect for classes within groups was also significant on these measures. More con- 
servative F-tests were performed by using the classes within groups mean square as the 
error term for the group mean square. This procedure controls for differences 
between classes within groups, and reduced the F-ratios for group effect to non- 
significant values, except for the Mathematics Concepts scores; here the informal 
group mean again proved to be significantly lower (Е 4-59; df=1,12; P<0-05). 


On the language measures, no significant differences between groups were 
observed. Again, however, significant effects for classes within groups were found for 
the mean length of utterance (F =4-66; df=8,69; P<0-001) and the Developmental 
Sentence Scores (Е —2:61; df=8,69; P<0-01). 


The remaining scores for the role-taking, instructional reading level, independent 
reading level, and teacher reading ratings were of an ordinal nature and chi-square 
tests were applied to the data. Using this procedure, no significant differences were 
found between the informal and comparison groups in the percentage of pupils 
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TABLE 5 


SUMMARY OF HIERARCHICAL ANOVA’S ON FouR READING AND MATHEMATICS 
ACHIEVEMENT MEASURES 








Test Measure Source df MS F P 
Primer Passage Groups 1 2326°63 5°13 «025 
Classes within Groups 12 1018:42 2:24 «013 
Error 125 453:92 ‘ 
Stanford Reading Groups 1 689°78 8:38 < 004 
Comprehension Classes within Groups 12 416:32 5:06 «001 
Error 233 82:30 
Stanford Mathematics Groups 1 217-75 13:63 «C001 
Concepts Classes within Groups 12 47:46 2:97 «001 
Error 238 15:98 
Stanford Mathematics Groups 1 148:45 5:57 «019 
Computation and Classes within Groups 12 187:36 7:03 <`001 
Application Error 244 26°66 


assigned to the four role-taking categories, instructional reading level categories, and 
independent reading level categories. However, the ratings assigned independently by 
the teachers did show different distributions (x? = 14-24; df=4; P<0-01). Teachers in 
the informal group rated a higher percentage of their students as reading beyond a 
grade one level than teachers in the comparison group (15-9 per cent vs. 5-9 per cent) 
and a lower percentage as reading at the grade one level (47-1 per cent vs. 61-0 per 
cent), a finding at variance with the scores from the standardised group and indi- 
vidually administered reading tests. 


DISCUSSION 


While it is erroneous to conceive of the two groups of classrooms studied as 
falling at opposite ends of a continuum with respect to the observational categories 
showing significant differences, the results of the first part of the study did demonstrate 
that children in the two groups engaged in various instructional activities to different 
degrees. Moreover, the characteristics of the classrooms we have labelled * informal ’ 
are very much in accordance with those mentioned in other papers discussing informal 
education (for example, Myers and Duke, 1977; Walberg and Thomas, 1974) such as 
child-initiated learning, freedom of movement and talk, abundance and diversity of 
manipulatory materials, encouragement of play as productive learning, and individ- 
ualised instruction. In addition, descriptions of British infant classes by Brandt (1972) 
and Resnick (1972) match our data on the informal classes. Specifically, Brandt's 
observational study of two infant schools in North-west London indicated at least half 
of the day to be spent in independent activities, 20-4 per cent to be spent in peer 
interaction, and 6:4 per cent at fantasy play. These percentages are surprisingly 
similar to ours, given Brandt's smaller sample and different observation procedures. 


Our data on pupil outcomes are also consistent with a number of recent com- 
parative studies reporting higher scores in reading and/or mathematics in more formal 
settings (for example, Bell et al., 1976; Bennett, 1976; Bereiter and Kurland, 1978; 
Solomon and Kendall, 1976; Stallings, 1975; Stebbins et al., 1977). The only finding 
favouring the informal group was that of the reading ratings assigned by the teachers 
who may or may not have used similar criteria or standards in assigning these ratings. 
It was also observed that, on the various reading and mathematics measures, the 
variation of classroom scores was generally greater for the informal group. The study 
did not examine why this was the case, but an inspection of the data suggests that the 
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informal or individualised approach was relatively more effective in upper income 
neighbourhoods and relatively less effective in lower income neighbourhoods. 


Equally important as the results in reading and mathematics, however, is the lack 
of significant differences in the other areas where it was felt that an activity and self- 
discovery approach to learning might lead to greater cognitive growth. Although on 
the reading and mathematics achievement measures, two or three of the informal 
classes scored lower than any of the seven first-grade comparison classes, the dis- 
.tribution of scores for the other test measures (for example, problem-solving, language, 
classification) were more comparable. While this study is obviously no test of 
Piagetian theory or its application to education (see Brainerd, 1978; Anthony, 1977) 
it should serve to reduce any assumptions one might have that simply giving children 
ample opportunity independently to explore manipulatory materials and to interact 
with their peers will result in a significant increment in cognitive growth over that 
found in more traditional classrooms. 


In addition to comparing two groups of classrooms as a form of programme 
evaluation, on the basis of the examination of 20 classrooms the study also docu- 
mented the wide variation which occurs within a single grade within a single school 
district. While the standard deviations for the various pupil activities presented in the 
tables for Part I give some indication of the substantial variation found across class- 
rooms, they under-represent the extremes which were often observed. For example, 
activities requiring the reading of text ranged from as little as 1-7 per cent of one class- 
room's activities to 22:1 per cent of another's; similarly, fantasy play activities ranged 
from 0 to 17-1 per cent. Numerous recent studies as discussed by Bennett (1978) have 
indicated that learning is positively related to the pupil's engagement or on-task time 
in that learning area; indeed, the observational procedure used in this study was based 
on this very assumption. We would hope that future studies, rather than taking a 
comparative approach, might focus on variations in pupil activities, not only to relate 
them to learning outcomes, but also to document for the lay person just how different 
classrooms in any given grade can be. 
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KINDERGARTEN PROGRAMMES AND THE YOUNG 
CHILD’S TASK ORIENTATION AND UNDERSTANDING 
ABOUT TIME SCHEDULING 


By B. C. NASH 
(Ontario Institute for Studies in Education) 


Summary. In a cumulative study over five years, detailed evaluation of kindergarten 
programmes was made along the dimensions of teacher treatment of time, classroom 
spatial arrangement, equipment and materials, and communication between people. 
After preliminary studies, four types of programme varying in time scheduling and 
planning for and reinforcement of task orientation were defined. The effects of the 
programmes on four- and five-year-old pupils were observed. 

As predicted, the task orientation of the children in both age groups varied with the 
degree of attention to programme planning to enhance it. Detailed description of the 
task orientation enhancing programme is provided. Girls in all programmes increased 
their task orientation more than boys, and in regular programmes decreased their task 
orientation in response to teacher-planned interruptions. 

In the enhancing programmes, children felt more responsibility for their own 
learning. It is concluded that the climate of the classroom for young children is set 
mainly through the treatment of time. The early part of the school is the period when 
task orientation can be increased and during the second half of the year it is consolidated 
or decreased. In programmes of the type most commonly used in North America and 
Great Britain, task orientation was found to decrease between January and May for 
almost half the children. 


INTRODUCTION 


Many studies of the outcomes of programmes for young children have failed to 
provide detailed descriptions of the characteristics of the programmes (Walsh, 1931; 
Andrus and Horowitz, 1938; Moore et al., 1972). Yet anyone who has visited a large 
number of early childhood settings will be aware of the variations among supposedly 
similar programmes. Even with a prescribed programme as in North American 
versions of a Montessori programme, the interaction of teachers and children and 
materials varies from setting to setting. Charters and Jones’ (1973) caution about the 
danger of evaluating non-events in education prompts continued attempts to find ways 
of observing and describing programmes accurately, so as to describe their outcomes 
in terms of child behaviour and learning. This article describes the results of one 
segment of a systematic analysis conducted during the development implementation 
phase of an approach to kindergarten teaching. 


Preliminary observations of, and discussions about, programmes for young 
children revealed a tendency for teachers and administrators to schedule time with 
more regard for the needs of the school, the teacher, or the programme content, than 
for any objectives related to the child's learning to use or understand time. This was 
revealed in the comments made to the investigator when time was discussed. A 
feeling of helplessness prevailed. “ We don't understand the child's view of time. We 
cannot plan our programme around something we do not know about." 


Discussion with several hundreds of teachers revealed that programmes had no 
particular objectives in relation to time, at least none that were related to any planned 
teaching strategy. But most teachers thought it would be desirable for young children 
tolearn to play constructively for long enough to achieve their own objectives for play, 
in contrast to the more distractable behaviour they saw in their classrooms. 


'This led to the use of the concept of Task Orientation defined as the time a child 
spends working spontaneously and in an engrossed fashion at a task of his own 
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choosing. Preliminary observations in 20 classrooms suggested that the child’s task 
orientation tended to.decrease from the beginning to the end of the kindergarten year 
in many programmes. Continued observation led to the notion that the reduction in 
task orientation might be related to interruptions timetabled into the programme. 
Children’s conversations supported this assumption. 


* Come and help me find all the pink stones in this puddle.” 


* No. As soon as we get started, she'll make us stop for juice and cookies or 
bathroom." 


The interruptions could be due to a combination of factors such as short time 
periods allowed for activities, with teacher directed changes, and little warning to 
children, most of whom cannot tell the time, about preparing for the end of a task or 
not beginning a new one. Or, perhaps, as the child gains confidence in the stimulating 
environment of kindergarten, he becomes more adventurous in choosing new activities. 
His newly gained confidence thus leads him to become more distractable in the absence 
of any incentive to develop longer task orientation. 


Such observations led the author to begin to analyse educational settings for pre- 
School and kindergarten age children in terms of four dimensions, Time, Space, 
People, and Things. The analyses of different teachers’ treatment of the dimensions 
of programming and of children's behaviour in various programmes became the basis 
for the development of a “ Learning Environment" approach to programming. 
Among the objectives of the approach is that of enhancing children's understanding 
and use of time. Depending on the level of understanding of, and commitment to, the 
objectives, different teachers have achieved varying levels of use of the programme 
(Hall and Loucks, 1976). 


The investigation examines several aspects of the interrelationship between the 
teacher’s treatment of time and the behavioural and affective response of the child. 
First, and most obviously, there are the relationships between the degree to which 
time-scheduling and time management by the teacher allows for the satisfactory 
completion of tasks, and the actual amount of time spent by children at a task. This 
is the Task Orientation study. Related to this are three developmental tasks which 
may be facilitated or inhibited in the child. Firstly, there is the development of the 
child’s ability to plan and use time constructively. Secondly, there is the relationship 
between the teacher and child’s time management and the child’s developing under- 
standing of what could be achieved in a short period of time, such as five minutes. 
Finally, there is the matter of classroom climate seen through the relationship between 
the teacher’s approach to time management and the child’s view of himself as an 
autonomous learning individual or as a non-directed or other-directed person in an 
organisation. 


The studies reported here resulted from cumulative observations of programmes 
and children over a five-year period. The assumption that an increase in task orienta- 
tion in play is of positive value goes together with an assumption that the child is 
interacting with people and learning materials in spaces planned to facilitate learning. 


METHODS 


Programme observation and description. Observation schedules were devised with 
the aim of describing kindergarten programmes in terms of the treatment of space, 
people, and materials as well as of time (Nash, 1977). For each dimension information 
was obtained from the teacher, from classroom observation, and from conversations 
with children. 


The information from teachers on programmes in which pupils in this study were 
located was gathered at three points during the year, late September, late January, and 
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late May. The programmes were observed for a minimum of three sessions each at 
the same points in the year. All but two programmes were observed for five complete 
sessions. A complete session would be a morning or an afternoon. There were 
conversations with each child at the same times of the school year. Videotape 
recordings were made of one complete session in each class of children, observed at 
the mid-point of that session. The observers were trained by using videotapes which 
showed examples of the four programmes described. 


The treatment of time was described partly through the scheduling of time. The 
flexibility of timetabling was examined, as was the availability of activities at all times. 
Seven subsections of the observation schedule were concerned with the encouragement 
or discouragement of task orientation. Other sections were about the encouragement 
of individual planning of tasks by children, and about timing devices. The conver- 
sations with children were included to assess their understanding of time, of the 
scheduling of time (if any), and of their objective for the tasks at which they worked. 


Throughout all of the programme observations made over a five-year period and 
involving over 200 classrooms, it has become clear that most teachers of young 
children are remarkably consistent over long periods in their treatment of time. This 
consistency probably relates to the teacher’s basic philosophy of early childhood, and 
in many cases to a general lack of critical analysis of the time dimension of pro- 
grammes. Whatever the reason, this phenomenon works to the advantage of those 
studying programmes and their outcomes. The relatively small size of kindergarten 
classes compared with higher elementary grades has also been helpful since it has been 
fairly easy to talk with all of the children in any of the programmes, where detailed 
data were required at several points in the school year. 


The educational settings and the subjects. ‘Three separate studies are reported, all 
of them based on observations of four- or five-year-old children in public schools in 
both urban and rural settings in Ontario. Except for the pilot project, observations 
were all of the children who were in a class for a whole year. The programmes were 
all half day, for five days a week, and usually had between 17 and 26 children each 
half day. All of the classes had a mixture of social, economic and ethnic backgrounds. 
The spatial arrangement of all classrooms was essentially similar, and the same 
learning centres were available in each. 


Task orientation observations. Each child was observed at three consecutive 
sessions in the pilot project and at two consecutive sessions at each point in the year. 
Observations were made in late September, late January, and late May. The children 
were observed at open-ended activities common to all settings—painting at easels, sand 
play, water play, manipulation/co-ordination games, collage. All observations were 
begun а short time after the start of the ‘ free play’ session and in mid-week. 


A child’s task orientation for any point in the year would be an average of his task 
orientation at the two or three consecutive sessions. As well as an averaging of 
duration of task orientation for the total group, results were analysed for each 
individual child to see whether task orientation had increased, decreased or remained 
constant. A multi-variate analysis was made of the results of the four-programme 
study. . 


THE PROGRAMMES AND OUTCOMES 


A. Measures of Task Orientation 

The pilot project. The aim of the initial investigation was to assess the effective- 
ness in increasing task orientation of a programme designed to reduce unnecessary 
interruptions to the child's play. In the * experimental ’ programme various strategies 
were used to decrease interruptions to the child's involvement in.an activity. The first 
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TABLE 1 
OPEN AND INTERRUPTED TIME SCHEDULES IN EARLY CHILDHOOD PROGRAMMES 
Experimental Programme (A, B, C) Regular Programme (D) 
9:00 Free play Early exercise 
9:15 Free play 
9:30 Gym 
9:45 Free play 
10:05 Snack available Snack time 
10:15 Free play 
10:30 Musical games 
10:45 Story time 


10:50 Centres gradually closed off 

11:00 Movie or musical activity 

11:10 Outdoor play Outdoor play 
11:25 End of session 


and obvious one was in timetabling. Table 1 shows the timetable in the experimental 
programme contrasted with that in a so-called regular programme of which a large 
number of examples were available. 


Other unnecessary interruptions to free play activities were eliminated. Activities 
likely to interest most of the children such as films or musical games or outside play 
were put at the end of the morning. As the time for these approached, the various 
activities centres were closed one by one as they became deserted. Thus the children 
would be discouraged from starting an activity too late in the morning. No child was 
asked to leave an activity for another, and care was taken not to arrange too many new 
activities that would compete for a child's attention on any single day. When a child 
showed no sign of losing interest in a task five minutes before an unavoidable inter- 
ruption, he was warned of the impending need to finish so that he would not get into 
the predicament, “ But I can't put away the puzzle and get ready for gym!" In 
addition, a planning board was used to encourage the recognition of the beginning and 
end of an activity. The child would place a picture card of an activity beside his name 
on the board as he began it and remove it as he finished. This experience was designed 
to enable the child to gain some insight into planning his activities. The teacher or 
research officer would comment favourably on a child's increasing tendency to play 
constructively and to complete tasks. The presence of the research officer facilitated 
adherence to all of the strategies aimed at increasing task orientation. 


Observations were made of 20 morning and 20 afternoon children in the experi- 
mental programme, all of those who attended the experimental programme for the 
whole year. In each of three regular timetabled programmes, seven morning and 
seven afternoon children, randomly sampled, were observed (М =42). Table 2 shows 
the task orientation duration for experimental and regular programmes at each of the 
observation points during the year. The children were four years of age. 


An average task orientation duration is given for each group as well as the range 
in seconds of individual task orientations. The individual task orientations are the 
average taken over the three observation times. АЛ 40 of the children in the experi- 
mental programme increased their task orientation time from September to January 
to May. Only half of those in the regular programme showed consistent improve- 
ment; the remainder regressed. The results are reported for the total sample rather 
than by sex because there was little difference between boys and girls in this study. A 
replicatory study in the following year yielded similar results with samples of 50 four- 
year-old children in task oriented and 42 four-year-old children in non-task oriented 
programmes in rural settings. 
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TABLE 2 


TASK ORIENTATION DURATION OF FOUR-YEAR-OLDS IN EXPERIMENTAL 
AND REGULAR PROGRAMMES 











Late September Late January Late May 
Range of Range of Range of 
Group Individual Group Individual Group Individual 
Group Average* Average* Average* ^ Average* Average* Average* 
Experimental 
= 30 127 
28 boys 12 girls 201 30-310 607 127-967 872 188-1024 
Regular 
N=42 
29 boys 13 giris 233 35-372 422 308-1514 366 476-1636 





* 'Task Orientation Duration in Seconds 


A comparison of four programme variations. Subsequently, based on the results 
of these studies in further settings similar to those described, the strategies to enhance 
task orientation became part of the Learning Environment approach. This approach 
was published and implemented to varying degrees by teachers in many junior and 
senior kindergartens. Implementation of the programme on a broad scale provided 
the opportunity for observation in classrooms where only some of the task orientation 
strategies were utilised (Nash, 1976). 


Four types of programme along a continuum of attention to task orientation 
were observed. Programme A was the experimental programme described above. 
Programme D was the regular programme, timetabled as Table 1 and with no specified 
objectives about task orientation. Programmes B and C were intermediate between 
these. Programme C runs on a timetable like that described for the experimental 
programme in Table 1, but no other strategies for enhancing task orientation were 
employed except for the occasional remark to a child that he was working hard at 
something. In programme B at least one hour of free choice play with the experi- 
mental timetables together with the use of a planning board itself was allowed, but the 
teacher did not make a point of rewarding or reinforcing the child's concept of the end 


TABLE 3 


TASK ORIENTATION FOR FOUR-YEAR-OLDS IN PROGRAMMES VARYING IN T-O 
TEACHING STRATEGIES 














Late September Late January Late May 
Range of Range of Range of 
Programme Individual . Individual Individual 
Type and Average in Averages X Averagein Averages Average in Averages 
Sex (N) Seconds іп Seconds Seconds іп Seconds Seconds іп Seconds 
А Girls (17) 201 38-374 656 307-1501 850 511-1602 
B Girls (20) 241 56-427 511 271-677 632 497—661 
C Girls (19) 206 50-312 402 241-617 451 196-734 
D Girls (22) 232 48-378 496 493-1039 536 200-864 
А Boys (21) 208 35-318 623 137-981 836 187-1011 
B Boys (18) 199 61-444 503 192-612 644 314—667 
C Boys (19) 261 60-318 441 200-617 508 205-721 
D Boys (18) 220 40-312 438 318-1150 441 212-1087 





Average Age in January: 5 yrs. 6 m. Range: 5 yrs. 0 m. to 5 yrs. 8 m. 
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TABLE 4 


TASK ORIENTATION FOR FIVE-YEAR-OLDS IN PROGRAMMES VARYING IN T-O 
TEACHING STRATEGIES 








Late September Late January Late May 
Range of Range of Range of 
Programme Individual Individual Individual 
Type and Average in Averages Average in Averages Average in Averages 
Sex (N) Seconds іп Seconds Seconds in Seconds Seconds in Seconds 
А Girls (19) 487 236-931 3150 438-3540 3201 1028-3540 
B Girls (16) 558 261-677 3091 678-3394 3102 1042-3390 
C Girls Q1) 558 233-801 1757 847-1874 1874 704-1832 
D Girls (19) 579 439-981 871 371-1042 906 214-972 
A Boys (18) 548 203-896 1408 315-1648 1516 548-1759 
B Boys (23) 500 203-618 1240 381-1515 1294 581-1457 
C Boys (22) 440 221-801 1210 677-1638 1277 487-1567 
D Boys (24) 619 196-868 967 314-1039 1028 271-961 
FIGURE 1 


- А COMPARISON OF TASK ORIENTATION DURATION FOR Four-YEAR-OLD 
Boys AND GIRLS IN Four PROGRAMMES 
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FIGURE 2 


A COMPARISON OF TASK ORIENTATION DURATIONS FOR FIVE-YEAR-OLD 
Boys AND GIRLS IN Four PROGRAMMES 
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of an activity. Used in this way the child tends to note the beginning of an activity 
rather than the end. Thus the notion of planning and beginning a task could become 
more important than that of finishing it. 


Two half-day classes were observed for each type of programme. Table 3 shows 
the average task orientation durations and the range of task orientation times for 
four-year-olds in junior kindergarten classes. Table 4 shows the results for five-year- 
olds. None of those observed had attended school prior to entry to school in the year 
in which the observations were made. Figure 1 presents a graph of average T-Os 
derived from Table 3 and Figure 2 those from Table 4. 

Table 5 is a summary of the statistical analysis of comparisons of programmes A, 
B, C, and D for four- and five-year-old children. The frequency distributions (number 
and percentage of cases by score point) along with means, variance and standard 
deviations were computed for the major variables in the study. Each distribution was 
broken down by classroom (1-16). Then variables ONE, TWO, and THREE were 
analysed. These are the activity points and the three points in time. 


TABLE 5 


THE STATISTICAL ANALYSES OF COMPARISONS OF THE FOUR PROGRAMMES IN Two PROGRAMME YEARS 
(N 316 їм 10 CLASSES) 


Log Time 1 Log Time 2/Time 1 Log Time 3/Time 2 





Source of Error Initial Status First Growth Second Growth 

Variation D.F.S. Term F df M.S. F. M.S. F. M.S. F. 

Programme 3 T/PY 4:03%% — 9/14°75 0-05 1-45 1:50  1460*** 0:04 7-04** 
ear 1 ТРУ  16627*** 3/6 1337 22482 0:47 4-60* 0-17  3071*** 

youn! 8 R 2:21***  24/818:49 0:06 1-74* 0-10 3:97*** O01 0°74 
ear, 

Programme 

Sex 1 ST/PY  1407*** 3/6 0:00 0:01 0-81 8:39** 0-02 0:96 

Year/Sex 1 ST/PY  1573*** 3/6 0:03 040 0°42 433% 0:00 0:09 

ST/PY 8 R 2:07***  24/818:49 0°06 1°79* 0-10 3°73 0:02 2:05** 

Replications 28:4 — 0:0342 0:0259 0:0076 





«Р<010; ** Р<005; *** Р<0901 


The last variable was calculated as log;; (THREE/ONE).+ It is a measure of the 
increase in activity from time 1 to time 3. In this analysis, the three analytic variables 
are LONE (logio of time 1 activity), LINI (logio of the ratio change from time 1 to 
time 2) and LIN2 (logy, of the ratio change from time 2 to time 3). That is, there is one 
initial status variable and two indications of growth. 


The means, variancy, and standard deviations for the three analytic variables 
(LORE, LIN1 and LIN2) for combinations of the variables Year, Programme, 
Teacher and Sex. Note that for each variable, the tabulation is divided into three 
parts: Year by Programme by Teacher by Sex, Programme by Sex by Year, and Sex 
by Year. The multivariate analysis of variance is based on three dependent variables 
(LONE, LIN1 апа LIN2) and on the four factors (Year, Programme, Teacher and 
Sex). In the design, Teacher is random and nested within Year and Programme. 
Pupils are random and nested in Year, Programme, Teacher and Sex. 


While the graphic representations of Tables 5 and 6 show the increases in T-O 
in the various programmes from October to January, Table 6 is intended to show how 
decreases in T-O from January to May contribute to the levelling off process. 


.. ТА log analysis was used for simplicity in that variation treatment groups are not correlated 
with the means so the simpler and the additive method holds. 
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INCREASES, STABILISATION AND DECREASES IN T-O FRoM JANUARY TO MAY 


IN Four PROGRAMMES 




















Year Programme T-O Тасгеазе Т-О Stable T-O Decrease Total 
A Boys 16 0 1 17 
Girls 21 0 0 21 
Boys 17 1 2 20 
Junior K Girls 18 0 0 18 
C Boys 13 2 4 19 
Girls 15 0 4 19 
D Boys 6 0 12 18 
' Girls 8 2 12 22 
А Boys 17 0 1 18 
Girls 10 0 0 10 
B Boys 9 1 13 23 
Senior К Girls 8 2 6 16 
C Boys 16 1 5 22 
Girls 20 0 1 21 
D Boys 8 0 16 24 
Girls 10 0 9 19 
Total 212 9 86 307 


A comparison of Programme A with two groups of five-year-olds. In Ontario, 
attendance at kindergarten is not compulsory. Some school systems have kinder- 
gartens for five-year-olds and others have started classes for four-year-olds. In an 
attempt to investigate further the discrepancy between task orientation for boys and 
girls in the senior kindergarten programme a comparison was made between a group 
of children who had been in an A type programme from age four and those entering an 
A programme at five years. Table 7 shows the results. 


TABLE 7 


TASK ORIENTATION DURATION OF FIVE-YEAR-OLDS IN PROGRAMME A FOR ONE YEAR 


AND FOR TWO YEARS 














Late September Late January Late May 
Range of Range of Range of 
Programme Average Individual Average Individual Average Individual 
and Duration Average Duration Average Duration Average 
Sex (N) in Seconds іп Seconds іп Seconds in Seconds іп Seconds іп Seconds 
1 year 

Boys (18) 548 203-896 1408 315-1648 1516 548-1759 
Girls (19) 547 236-931 3150 483-3540 3201 1028-3540 

2 years 
Boys (12)* 762 250-908 1812 370-2952 : 2412 847-3252 
Girls (11)* 840 430-1504 2530 1097-3066 3370 1276-3490 





* 1 Half-Day Programme observed. 


B. The Child’s Developmental Tasks in Relation to Time 

Task understanding. During the pilot project it was established that for most 
four-year-old children there are two kinds of tasks, the open-ended and closed-ended. 
The closed-ended task would be work on a puzzle where the completion of the task is 
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marked by getting all the pieces together. Another type of closed-ended task would 
be one where an activity is begun and ended at the request of the teacher, for example, 
“I want you to bounce the ball until I blow the whistle." An open-ended task is most 
easily exemplified by work of a creative nature, painting, collage or woodwork. To 
find out what the child thought about an open-ended task he was approached while 
working at a creative task and asked “ Tell me about your picture ” (or other art work). 
During the conversation about the picture at some point he would be asked “ How do 
you know when you've finished? " Responses to the latter question could be cate- 
gorised according to whether the child understood either the task or the question, or 
had his own plans for using time, or believed that his teacher decided when tbe task 
was finished. In programmes of type A, most four- and five-year-olds had set them- 
selves criteria for the completion of a task and had planned using them. In D type 
programmes only two or three of each age group had done so. Programme type B 
and C were intermediate in numbers of children understanding or failing to understand 
tasks. 


Understanding of five minutes. Each child was asked “ How long is five minutes? " 
and “ What could you do in five minutes? " Responses to the first question were of 
three types. Some children had no idea how long five minutes was. Some gave 
realistic responses in terms of what could be achieved in five minutes and others gave 
unrealistic responses. No four-year-olds understood the question and only 13 of the 
153 five-year-olds questioned gave realistic responses. Hence the second question 
about what could be done in five minutes which can then be categorised as realistic, 
unrealistic or non-committal. Table 8 shows the distribution of these responses in the 
four programmes. 


TABLE 8 


REALISTIC AND UNREALISTIC ASSUMPTIONS ABOUT FIVE-MINUTE ACTIVITIES BY 
FIVE-YEAR-OLD CHILDREN IN FOUR TYPES OF PROGRAMME* 








Programme Type Non-Committal Unrealistic Realistic 
A 6 2 28 36 
B 10 11 16 37 
C 10 27 13 50 
D 29 8 3 40 
Total 55 48 60 163 


* No four-year-olds understood the question. x^--71:18, df — 11, Р<0:001. 


Understanding of scheduling and feelings of autonomy. Initially the questions 
about the child's understanding of schedule and his feelings of self-directedness were 
thought to be unrelated. However, the responses were such that it made no sense to 
separate them. One of the questions was, “ You've been working at this——for a long 
time. Why?" Responses to this were either “I don't know”, or “ Because I like 
it ", * Because I want to " or other words to this effect, or “ Because I have to ”, or 
** Because the teacher told me so.” Other parts of the conversation with the child 
which were used to clarify the responses to number to the latter questions were: “ Do 
you enjoy coming to school?” “Tell me what you do first at school in the mornings 
(afternoons).” “Then what?” “ Why?" “Then what?” ‘ Why?” 


Responses here would sometimes begin with “I don’t know” and change to 
** Because teacher said so." In response to the latter remark, the question “Is that 
why you don't know? " elicited the response “Yes, of course " or “It’s the same 
thing.” Tables 9 and 10 present the results in terms of feelings of self-directedness 
where the child states personal choice as a reason, of non-directedness where the child 
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does not know why he engages in an activity but does not suggest external direction, 
and other-directedness where responsibility for choice appears to the child to be in the 
hands of the teachers. 


TABLE 9 


Four-YEAR-OLD CHILDREN’S FEELINGS OF SELF-DIRECTEDNESS, NON-DIRECTEDNESS 
AND OTHER-DIRECTEDNESS ABOUT TASK CHOICE IN FOUR TYPES oF PROGRAMME 


Programme Type — Self-Directedness Non-Directedness Other-Directedness 














A Girls 10 7 0 17 
Boys 16 3 0 19 
B Girls 10 5 3 18 
Boys 10 8 0 18 
C Girls 1 8 9 18 
Boys 6 11 1 18 
D Girls 0 1 19 20 
Boys 1 3 14 18 
Total 54 46 46 146 
Combining responses of boys and girls x? —96:09, df=11, Р<0:001. 
TABLE 10 
FIVE-YEAR-OLD CHILDREN’S FEELINGS OF SELF-DIRECTEDNESS, NON-DIRECTNESS 
AND OTHER-DIRECTEDNESS ABOUT TASK CHOICE IN Four TYPES OF PROGRAMME 
Programme Туре Self-Directedness Non-Directedness Other Directedness 
A Girls 10 6 2 18 
Boys 16 2 0 18 
B Girls 2 12 2 16 
Boys 16 4 1 21 
C Girls 2 8 10 20 
Boys 12 8 0 20 
D Girls 0 2 17 19 
Boys 2 3 16 21 
Total 60 45 48 153 


Combining responses of boys and girls x^ —-170:16, df=11, Р<20%001. 


DISCUSSION AND CONCLUSIONS 


Perusal of Tables 2-6 reveals several points of interest. While both boys' and 
girls" task orientation duration increases in the first half of the year in all programmes 
and then levels off, the increase is quite small in programme D for both boys and girls 
in each age group. Yet programme D is the typical North American kindergarten 
programme in which children are not encouraged to take responsibility for the use of 
time. Programmes with similar characteristics have been observed by the author in 
all parts of Britain and in France and Switzerland within the past three years. The 
levelling off is less for four-year-olds which, together with the figures for the five-year- 
olds, probably indicates a developmental trend. Obviously the developmental trend 
can be enhanced by rational programme planning and more importantly can be 
suppressed by other kinds of programmes. It must be remembered that these are 
average durations for groups of around 20 children for each of whom three observa- 
tions were made at each point in the year. These are averages of between 48 and 72 
observations, yet the means are distinguishably different for each group. The trend 
analysis confirms the results. 
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More disturbing evidence of the effect of typical programmes of type D was seen 
in Table 6. More than half of the pupils in programme D decreased their T-O from 
January to May. Itis suggested that large increases in T-O can be achieved by teachers 
in the first half of the first school year. The period January to May can be regarded 
as a consolidation time for the effects of the programme. There is usually a greater 
discrepancy in programme D and, as would be expected, this is allied with the fact that 
a significant number of children in programmes of type D reduced their T-O from 
January to May. At the same time, no children in programme A, a programme 
rationally planned to enhance task-orientation, decreased their T-O duration. The 
trend for girls to be more affected than boys by programmes both in positive and 
negative directions merits further investigation, perhaps with other behavioural or 
personality measures, 


These findings suggest serious implications for kindergarten programming. A 
suggestion has been made that the kindergarten child’s increasing security gives him 
freedom to wander from one activity to another. Yet rich environments were pro- 
vided in the experimental programmes. The children’s response to the wealth of 
activity was to choose carefully and to try different activities on different days, usually 
maintaining a high level of task orientation regardless of task. 


The qualitative correlates of the various programmes may be seen as startling and 
disturbing. The time dimension of the programme is the one through which the 
climate of a classroom, the interrelationships between teacher and child can be 
assessed. It is not far-fetched to suggest that at this early age the child begins to 
develop his sense of himself as an autonomous learning person through the respect 
shown him by adults, as they relate to him about his use of time. His ideas about what 
is expected of him at school are developed. To assume adult notions of time in the 
child, or to assume he understands nothing of it, and for either reason to deny him ex- 
planations of what is happening to him is to abdicate responsibility for helping him to 
achieve a sense of time as a work study skill. We see examples of this in the children’s 
comments about time, interpreted in Tables 9 and 10 as self- and other-directedness. 


There have been few studies of the effects of overly teacher-directed time- 
scheduling. The consequences of allowing the young child large blocks of time for his 
work have been described briefly by Elkind (1969). The effects of programme D on 
children were predicted by Elkind, but the present study is a first attempt to investigate 
the correlates of programming in a rigorous manner. 
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GIRLS AND MATHEMATICS: PARENTAL VARIABLES 


By PEGGY STAMP 
(Department of Educational Research, University of Lancaster) 


SUMMARY. 234 girls taking A-level mathematics were compared with 265 girls taking 
A-level French. They were more reserved (A), stable (C), tough-minded (I), radical 
(Q1) and group-dependent (Q2) on the 16PF, and less feminine on the CPI Fe Scale. 
Girls in both subjects tended to identify with their fathers rather than mothers. Father 
identification is related to tough-mindedness and masculinity on tests, but with typically 
feminine leisure activities. So is choice of mathematics. Subject choice is related to both 
parents’ attitudes, but subject attitudes are related only to the mothers’ attitudes. 
Double Mathematics girls were both more conventional socially and more radical 
mentally than either of the other groups. 


INTRODUCTION 


THE proportion of girls taking mathematics in the pre-university school year ranges 
from 1 in 8 in Belgium to 1 in 3 in Scotland, but everywhere in the West it is less than 
half (Keeves, 1973). In England and Wales in 1975 it was 2 in 9 (DES, 1975). 
Although Russia pursues a deliberate policy of equality between the sexes, and 
encourages mathematics for both, males considerably outnumber females at higher 
levels and in the Olympiads (Krutetskii, 1976). 


The widespread nature of this imbalance has led some investigators to suggest 
that it is the result of innate differences in ability between the sexes. Buffery and Gray 
(1972) claim that the observed difference in spatial ability is the result of a delayed 
lateralisation of cerebral dominance among males, which enables their non-dominant, 
right hemisphere to achieve a higher level of spatial understanding. Whether they are 
correct in their thesis that spatial ability depends on a recessive gene attached to the 
X-chromosome, it is consistent with the greater spatial ability shown by boys (Smith, 
1964; Vernon, 1950). Witkin (1973) mentions current research into the possibility 
that field independence, which can be seen as both an aspect of spatial ability and a 
cognitive style, is similarly related to the X-chromosome. It seems likely that some 
such explanation will be found to account for the consistent sex difference in this 
ability, but its importance for mathematics has yet to be established (Very, 1967). 


Most research has failed to reveal a specifically mathematical ability (Barakat, 
1951; Smith, 1964), though recently Cooley and Lohnes (1977), in a factorial analysis 
of Project TALENT data, have isolated one, which they designate as a “ knowledge 
factor, uncorrelated with general intelligence ”. Krutetskii (1976), on the other hand, 
found that all the mathematically gifted children he examined scored high on a 
general factor of intelligence, and also on a verbal-logical factor, but not necessarily 
on a visual-pictorial factor. 


In any case, intellectual differences between the sexes are slight by comparison 
with personality differences (Tyler, 1969). Maccoby (1967) notes that in many cases 
the relationship between personality variables and performance on intellectual tasks 
is curvilinear, with a moderate score being the optimum, but scores for boys and girls 
being differently distributed. Thus anxiety is a disadvantage for girls’ intellectual 
development, but an advantage for boys’; dependency is a particularly undesirable 
trait for boys’ IQ; aggression is more advantageous for girls; the more feminine boys 
and more masculine girls are likely to have higher IQ scores as well as perform better 
on tests of creativity and analytical thinking. 


Most personality factors do not seem to be strongly related to success in mathe- 


matical performance, but Aiken (1973) mentions several which have been associated: * 


responsibility, independence, low impulsivity, and reflectiveness. There has been a 
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fairly consistent pattern for scientists (Entwistle and Duckworth, 1977) which also 
include mathematicians: intelligence, dominance, tough-mindedness, intellectual self- 
sufficiency and control (Cattell’s Sixteen Personality Factors B, E, I, Q2 and Q3) 
(Hutchings et al., 1975; Pont, 1970). 


One factor in particular has been suggested as relevant to mathematics, because 
it is a ‘masculine’ subject. This is a masculinity-femininity dimension, as measured by 
several tests devised from questions which discriminate effectively between males and 
females (Anastasi, 1958; Astin and Myint, 1971; Haven, 1972). Lambert (1960) 
found that male mathematics majors did not differ from their non-mathematics 
counterparts on M-F score (MMPI), while female mathematics majors were more 
feminine than female non-mathematics majors. But there does appear to be a differ- 
ence between the sexes which is related to problem-solving ability (Milton, 1957), and 
to performance on the Embedded Figures Test of field independence (Bieri, 1960). 


Another aspect of masculinity-femininity is that of sex-roleadoption. Howeverher 
understanding of behaviour appropriate to a girl is acquired, it is argued that any girl 
is likely to be alienated from mathematics just because it is regarded as a masculine 
subject, and will excel in it only if she identifies with a strong male figure (Plank and 
Plank, 1954). The evidence for this thesis is more theoretical than empirical. Only а 
few of the mathematical women in Osen's (1974) book about prominent mathematical 
women manifest this relationship. Itis nevertheless a persistent theme in the literature 
(Maccoby, 1967), while the relationship with the mother, even if she herself has been 
mathematical, has been virtually ignored. 


Even if identification with a father or other male figure is not necessary, rejection 
of the stereotyped feminine role does seem to be, for academic success generally and 
mathematics in particular (Elton and Rose, 1967; Ваша and Hunt, 1975). Slee's 
* Feminine image factor ° (1968), affecting girls aged 12 to 14, which accounted for a 
considerable proportion of their attitudes to school subjects, was amply illustrated in 
Coleman's (1961) study of adolescents, which revealed that the brightest girls were 
careful not to appear so, and placed social status ahead of academic success. As 
Horner (1972) explains, girls are conditioned to feel that competence, independence, 
competition and intellectual achievement are good, but inconsistent with femininity, 
:and will have negative consequences for women. Only by rejecting the aspect of 
femininity which prescribes dependence and submission is a girl able to develop the 
ability to think analytically (Maccoby, 1970). Whether she is able to do this may 
depend on her upbringing, which is related to the education and social class of her 
parents. Bing (1963) found that spatial ability was fostered in children by giving them 
some freedom and independence, and Maccoby (1970) relates this to analytic thinking. 
There is considerable evidence that, particularly among middle-class parents, indepen- 
dence is encouraged more for boys than for girls (Brandis and Henderson, 1970; 
Newson et al., 1973; Sears et al., 1957). 


It seems clear that schools continue to reinforce society's sex-role stereotypes, 
particularly co-educational schools (Dale, 1974). There may indeed be social ad- 
vantages to co-education (Dale, 1969), but there are educational losses, particularly 
to girls, and particularly in subjects such as mathematics, which they are much less 
likely to choose at A-level in mixed schools (DES, 1975; Ormerod, 1975). 


Clearly many factors operate against girls choosing to study mathematics beyond 
compulsory levels. Those who do choose science, including mathematics, have usually 
been characterised by Cattell’s 16PF Factor E (assertiveness) and Factor I (tough- 
mindedness) (Hutchings et al., 1975; McNair, 1973), though mathematics without 
science has not usually been separated out. 


A qualification in mathematics is a necessary prerequisite for many careers and 
fields of study (15 out of 20 subject areas at Berkeley: Binyon, 1977), and an orienta- 
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tion towards it distinguishes girls preparing for careers from an early age (Astin, 
1968a, 1968b). Of course, career options are significantly limited by subject choices 
made early in secondary school, often without any consideration of future career 
possibilities, and without much assistance from schools (Davies and Meighan, 1975). 


Because of its importance in relation to many careers, and because pre-emptive 
subject choices are often made early, it is important to understand why so few girls 
choose to study mathematics at A-level. Central to the discussion of the intellectual 
and mathematical performance of girls have been the masculine-feminine dimension, 
sex-role and parental identification. Must a girl reject the feminine role and identify 
with the masculine role, a male figure, specifically her father, in order to do well in the 
subject? What about her mother? And what relation does this identification have to 
other variables? 


This study attempts to illuminate some aspects of these problems, in particular: 
(1) Parental identification, its relation to subject choice and other variables, for 


girls. 

(2) Masculinity-femininity, its importance in relation to subject choice and other 
variables. 

(3) Maternal influence as it affects subject choice. 


METHOD 


The sample consisted of 499 girls taking A-level courses in 1975 and 1976 from 
14 schools in the education authorities of Lancashire and Cumbria, and two schools 
in Manchester. The intention, in the choice of sample, was to include as wide a range 
as possible of different kinds of schools and communities. Apart from the two 
schools in Manchester, three were in and around Preston, while the others were in 
smaller centres, giving the sample a rather rural slant. Six of the schools were single- 
sex grammar, four were mixed grammar, three comprehensive and three sixth-form 
colleges. 234 girls were taking mathematics, and 265 were taking French. The girls 
who chose French formed a contrast group for comparison with the mathematics girls, 
French as an A-level subject being considered to be of comparable academic difficulty, 
but chosen more by girls than by boys. 


Each girl completed Cattell’s Sixteen Personality Factor Test, the Fe Scale from 
the California Personality Inventory, and a questionnaire about herself and her 
attitudes. She also took home questionnaires for her parents to complete and return. 
Completed questionnaires were received for 366 mothers (186 French, 180 mathe- 
matics) and 343 fathers (175 French, 168 mathematics). Short interviews were held 
with girls from the 1975 sample who were available in 1976. 


RESULTS 


Personality Tests 

Mean Sten Scores are shown for the 16PF, as it has been standardised to an 
English population to a Sten Score, with a mean of 5-5 (Saville, 1972). No English 
standardisation of the CPI was available, so the American standardisation is used for 
the Fe Scale. It is a Standard Score, with a mean of 5:0. Both groups, the mathe- 
matics and French girls, have scores close to the mean. Results are shown for the 
significance of the difference between uncorrelated means (Table 1). 


The mathematics girls were more reserved, more emotionally stable, more tough- 
minded, more desurgent (16PF Factors A, C, I and Е) than the girls taking French, 
as expected. They were also more experimenting and radical (Factor Q1), which, 
though not predicted, is consistent, but they were unexpectedly revealed as more 
group-dependent (Q2) than the French girls. On the Femininity Scale of the CPI, the 
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TABLE 1 
MEAN SCORES AND STANDARD DEVIATIONS OF PERSONALITY FACTORS ВУ SUBJECT CHOICE 








Mathematics French 
(N=234) (N=265) 
--------------- ----------------- t test 
Factor Mean SD Mean SD P« 
A 4°79 221 6:00 2:19 0-0001 
B 7:56 1:50 7°26 1:53 0:05 
С 482 1:86 425 1:93 0:001 
Е 6:52 211 6°59 1:93 NS 
F 6:90 1:90 725 1:75 005 
а 4:94. 192 4:91 2:08 NS 
H 5:25 2:00 517 211 NS 
I 3:33 1:90 487 1:84 0:0001 
L 6°36 175 6:55 1:59 NS 
M 5:74. 1:82 6:04 1:82 NS 
N 447 1:79 4:74 1:87 NS 
о 5:32 1:94 5:45 171 NS 
QI 5:62 1:50 5:17 1:53 0:001 
Q2 5:28 1-81 5°67 1:69 0:01 
03 3.86 171 3°78 1:83 NS 
Q4 5:71 2:00 5:03 2:03 NS 
Fe 4:79 107 5:18 1:12 0:0001 


mathematics girls were shown as more ‘ masculine’. Aggression (Factor Е) did not 
distin guish between the two groups. 


Girls who identified with their fathers differed from those who identified with 
their mothers in being more tough-minded (P<0-005, 16PF Factor I) and more 
* masculine ' (P « 0-002, CPI Fe Scale). 


Parental Identification 

In order to ascertain the girls’ parental identification, they were asked: ‘ Which 
parent do you think you resemble most, in character and personality (not appear- 
ance)?" Other research (Bieri et al., 1959) had indicated that a direct question of this 
sort produced clear answers with good agreement with other measures, and the 
interviews held in 1976 confirmed the validity and reliability of the question as an 
indication of parentalidentification. Rather surprisingly, the responses indicated that 
the mathematics girls did not tend to identify with their fathers any more than the 
French girls. Both tended to identify with their fathers rather than their mothers. If 





TABLE 2 
GIRLS’ PARENTAL IDENTIFICATION 
Subj ect Girls Identify with: 
| Choice Neither Both Father Mother 
Mathematics 8 6 121 9 
(52%) (42%) 
French 7 10 139 109 


Chi-squared NS 


father identification is a relevant factor in girls’ orientation to fields of study, it must 


be one which operates on intellectual development generally, rather than mathematics 
specifically. . 
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A further question was included in the girls’ questionnaire to supplement that on 
parental resemblance. It was: “ Was there anyone at home, or at school, or other, 
who advised you or influenced you in your choice of A-level subjects? Who?” If we 
consider only those responses which gave a single source of influence, this did dis- 
criminate effectively between the two subject groups, the mathematics girls being much 
more likely to mention their fathers and the French girls their mothers. However, the 


TABLE 3 
INFLUENCE ON Á-LEVEL SUBJECT CHOICE 





Subject Father Mother School Father and 

Choice only only only Mother 
Mathematics 20 6 26 18 
French 6 15 18 37 





Chi-squared 19:90, P 0:005 


mathematics girls were most likely to mention school as the main influence, and the 
French girls both parents jointly, so the distribution may well simply reflect social 
stereotypes. Well over half the girls admitted to no influence at all on their choice of 
subject. If we include all the influences mentioned, the significant differences vanish. 


A comparison of the girls who identified with their fathers and the girls who 
identified with their mothers revealed something of the nature of this identification. 
Of the girls with fathers in professional and higher managerial occupations, 63 per cent 
identified with their fathers, while those whose fathers were in the non-manual or 
skilled manual classes tended to identify with their mothers. There was a strong 
tendency also for girls to identify with fathers who were well educated. No such 
pattern was apparent in relation to mothers, ‘but it was felt that any such tendency 
would be obscured by assortative mating. 


Attitudes to subjects 
The parents’ attitudes to the two subject areas were indicated by their responses 
to two questions: “ Were you regarded as being good at mathematics/languages? ”, 


TABLE 4 
PARENTS’ ATTITUDES TO MATHEMATICS 


Fathers Fathers 


Good at Liked 
Subject Choice Yes No Yes No 
Mathematics 145 20 132 34 
French 128 48 115 62 
Chi-squared Chi-squared 
11:48, P «0:005 8:33, P 0:005 
Mothers Mothers 
Good at Liked 
Yes No Yes No 
Mathematics 125 49 119 57 
French 117 69 100 86 
Chi-squared Chi-squared 
2:84, Р<010 666, Р<0-01 


Loc E m 
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TABLE 5 


PARENTS’ ATTITUDES TO LANGUAGES 














Fathers Fathers 
Good at Liked 
Subject Choice Yes No Yes No 
Mathematics 44 57 46 54 
French 65 48 64 50 
Chi-squared Chi-squared 
3:67, Р<010 1:89, NS 
Mothers Mothers 
Good at Liked 
Yes No Yes No 
Mathematics 60 49 57 51 
French 84 42 98 27 
Chi-squared Chi-squared 
2:85, P<0°10 16:01, Р<0-005 


Nore.—The numbers in these tables are not always consistent, 
because some of the questions were not answered. 


and * Did you like mathematics/languages? " The attitudes of both parents appear 
to be influential on the daughters’ subject choice (Tables 4 and 5). They are more 
likely to choose mathematics if their mothers liked the subject, and if their fathers liked 
it and were good at it. They are more likely to choose French if their mothers liked 
languages, though their fathers’ influence is not established. A high level of mathe- 
matics education on the father's part is also significantly positively related to his 
daughter's choice of mathematics at A-level. 


TABLE 6 
PARENTS’ MATHEMATICS STUDY 








Fathers Mothers 
Level of Study Maths French Maths French 
Less than School Certificate 48 56 79 80 
Other (extra-school) 6 2 — = 
School Certificate (O-Level) 33 67 74 87 
School Certificate and extra 44 33 13 8 
Higher School Certificate 14 7 6 6 
Higher School Certificate and extra 9 6 3 2 
University or more 14 9 3 2 
Chi-squared Chi-squared 
19:41, P 0:005 5 





However, the attitudes of the girls to the two subjects, as distinct from their actual 
choice, seem to be under different influences. These are related mainly to their 
mothers’ attitudes, in both subjects (see Tables 7 and 8). Thus their attitudes are 
influenced by their mothers, but their choice of subject by the parent of the appropriate 
sex, fathers for mathematics, mothers for French. For the most part these attitudes 
seem unrelated to parental identification, but there are two notable exceptions. Girls 
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TABLE 7 


GIRLS' ATTITUDES RELATED TO PARENTS’ ATTITUDES 


TO MATHEMATICS 





Fathers Fathers 
Good at Liked 
Girls like ———— 

Mathematics Yes No Yes No 
Yes 209 42 186 61 
No 70 26 62 35 

Chi-squared Chi-squared 
3:84, P 0:05 4:02, Р<0'05 
Mothers Mothers 
Good at Liked 
Yes No Yes No 
Yes 185 73 169 91 
No 57 46 51 52 
Chi-squared Chi-squared 
8:13, Р-<<0%005 6:88, Р< 0:01 
TABLE 8 





GIRLS ATTITUDES RELATED TO PARENTS’ ATTITUDES 


TO LANGUAGES 








Fathers Fathers 
Good at Liked 
Girls like 
Languages Yes No Yes No 
Yes 94 79 93 80 
No 15 27 17 25 
Chi-squared Chi-squared 
3:58, Р< 0:10 1-89, NS 
Mothers Mothers 
Good at Liked 
Yes No Yes No 
Yes 127 72 141 57 
No 17 20 15 21 
Chi-squared Chi-squared 
3:36, P-<0°10 10°67, Р<<0:005 
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are more likely to identify with a mother who liked mathematics and a father who was 
good at languages (Tables 9 and 10). 


Masculinity қ 
The girls who chose mathematics scored significantly lower on the Femininity 


Scale of the CPI: that is, they were more masculine in their interests than were the 
girls who chose French (Table 1, above). This was true also of the girls who identified 
with their fathers as opposed to those who identified with their mothers. In their 
career plans, too, the girls who chose mathematics were more likely to have definite, 
specific careers in mind, and these were less likely to be conventionally feminine 


careers, than are the girls who chose French. 
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TABLE 9 


PARENTAL IDENTIFICATION AND PARENTS’ ATTITUDES 


TO MATHEMATICS 




















Fathers Fathers 
Good at Liked 
Girls Identify 
with Yes No Yes No 
Father 140 40 126 57 
Mother 117 23 107 32 
| NS NS 
Mothers Mothers 
Good at Liked 
Yes No Yes No 
Father 119 70 107 82 
Mother 106 42 103 47 
NS Chi-squared 
4:57, P<0°05 
TABLE 10 


PARENTAL IDENTIFICATION AND PARENTS’ ATTITUDES 


TO LANGUAGES 


























Fathers Fathers 
Good at Liked 
Girls Identify ————— 
with Yes No Yes No 
Father 73 52 69 57 
Mother : 30 47 36 40 
Chi-squared NS 
6:06, P 0:025 
Mothers Mothers 
Good at Liked 
Yes No Yes No 
Father 74 40 78 37 
Mother 61 45 65 38 
NS NS 
TABLE 11 
Gms’ PLANS FOR FUTURE 
Plans . Mathematics French 
Job 20 42 
Career, general 79 110 
Career, specific 92 89 


Chi-squared 7-45, Р<<0:025 
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But it should be pointed out that the French girls, although their career plans were 
less specific, were still planning to have genuine careers, and seemed no more inclined 
to a traditional domestic role than their mathematics counterparts. Although most 
of the girls expected to marry, almost all of them wanted a demanding, challenging 
occupation outside the house, and they expected to return to it after having children. 
There was no relationship between parental identification and career plans. 


In other respects the girls choosing mathematics manifested preferences which 
were more feminine than those of the girls who chose French. Among leisure interests 
and activities mentioned by them were more of the kind which could only be cate- 
gorised as feminine, such as sewing, cooking, or looking after children. Even if these 
were then sub-classified according to whether they were craft (sewing) or nurturant 
(baby-sitting), the mathematics girls still mentioned them at least as frequently. The 
same bias was apparent when a comparison was made on the basis of parental 
identification, the girls who identified with their fathers mentioning proportionally 
slightly more feminine and nurturant interests. А 
Double Mathematics 

The small group of girls studying Further Mathematics as well as mathematics 
(known together as Double Mathematics) represent the nearest approximation we can 
achieve to mathematics choice separate from science, though for a number of reasons 
it is often accompanied by a science choice. 


Because they were so few in number (26) most results did not reach statistical 
significance, but there is a pattern. The Double Mathematics girls were more intel- 
ligent, more radical, more group-dependent and more masculine (Cattell’s B, Q1, Q2 
and Gough's Fe) than the main mathematics girls, who were anyway more intelligent, 
radical, group-dependent and masculine in their tastes than the French girls. They 
were also more conscientious, persevering, staid and rule-bound (Cattell’s Factor G) 
than the other mathematics girls, who did not differ on this factor from the French 
girls. They could be described as socially more conventional and intellectually more 
radical than the other mathematics girls. 


This is borne out in their leisure activities, which show them to be rather more 
social, nurturant and unusual in their tastes than the other mathematics girls, and 
indeed largely accounting for the bias in the main group in those directions. A 
marked enthusiasm for sport which characterised almost every Double Mathematics 
girl fails to distinguish them because most of the other girls mentioned sports as well 
(about 75 per cent of both mathematics and French girls). 


They bear a remarkable resemblance to Aiken’s (1973) description of professional 
mathematicians (of unspecified sex) as “ high on order, but low on the need for change; 
they were reserved, sensitive, conscientious and conventional in their ыы but 
highly individualistic in spirit ". 


DISCUSSION 


Parental Identification 

Without data about the parental identification of other girls of different ages and 
in different categories—social, educational and economic—it is difficult to interpret the 
finding that all of these girls, in both French and mathematics, tended to identify with 
their fathers rather than their mothers. We do not know whether parental identifica- 
tion changes with age, or shows a developmental pattern. We do not know if it varies 
according to socio-economic status. The assumption underlying this study is that 
most girls will tend to identify with their mothers, whereas these girls, who have 
chosen academic subjects at A-level, have tended to identify with their fathers. 
However, evidence for this assumption is singularly lacking. All we can say is that 
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father identification is not a special characteristic of girls who choose mathematics. It 
does seem to be related to fathers’ educational attainment and relative position in 
society—positively related to success, in effect. 


Femininity 

The degree to which these girls can be regarded as ‘ masculine’ would depend 
very much on what was designated by the word. The Fe Scale on the CPI proved 
efficacious in distinguishing the mathematics girls from the French girls (though the 
difference was less marked with the Double Mathematics girls). In so far as thinking 
in mathematical ways, being ‘ tough-minded °, thing-oriented, having interests and 
attitudes more commonly associated with men, are concerned, then yes, they are 
masculine. But they are also very feminine, not in the sense of being dependent and 
emotional, but in having feminine interests in homemaking, and nurturant urges. It 
is clearly possible to accept some aspects of a designated sex-role and reject others, as 
these girls have done. 


Certain distinctive personality features, particularly of the Double Mathematics 
girls, which do not form part of the typical profile of a science chooser, suggest a 
possible strategy for encouraging more girls to study mathematics to higher levels. 
These mathematics girls were not more aggressive and they were more group- 
dependent than similar girls who chose French, a ‘feminine’ subject. From this it 
might be suggested that there could be some advantage in dissociating mathematics 
from science in A-level options, offering it as a ‘ service’ subject, useful in many 
disciplines, and encouraging more conventional girls to pursue it. 


Although many of the personality differences found had been expected, they still 
do not satisfy the underlying question: Do girls choose mathematics because they are 
that sort of person, or are they that sort of person because they are good at mathe- 
matics? Do they identify with their fathers because they are tough-minded and have 
masculine interests, or the other way about? No simple answers can be expected, but 
it should be remembered that many of these personality factors are probably subject 
to environmental influences (Adcock, 1970). 


Relationship with Mother 

These girls are very much influenced by their mothers, whether they identify with 
them or not. The influence is basic in relation to mathematics and languages, and 
affects the attitudes of the girls to these subjects, while the fathers’ influence on these 
attitudes appears to be slight. In choosing subjects, however, girls are influenced by 
the parent of the appropriate sex: fathers for mathematics, mothers for languages. 
Conversely, girls are more likely to identify with a mother who liked mathematics, or 
a father who was good at languages. 


Without replication and further exploration it is not reasonable to draw con- 
clusions about this, but one can conjecture. Jf mothers’ attitudes are so important in 
influencing daughters’ attitudes to subjects, it is all the more important that girls 
should acquire confidence and competence in mathematics, whether or not they 
need it in a career, so that their daughters in their turn can have a positive attitude 
to it. 


The finding that mothers who like mathematics are more likely to have daughters 
who identify with them gives rise to further speculation about the association of 
mathematics with competence and success, the factors that seem to incline girls to 
identification with fathers. 


If more girls are to be encouraged to choose mathematics as a course of study to 
higher levels, it appears from this study that they would be susceptible to persuasion 
at two stages in their lives: in early years, when the attitudes which they form to 
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different subjects are affected by their mothers’ attitudes; and later, when they make 
choices of subjects, and they are influenced by people who appear influential, such as 
fathers and teachers. 
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A CROSS-CULTURAL STUDY OF SCIENCE 
CLASSROOM INTERACTIONS 


Ву R. G. HACKER, В. L. HAWKES AND M. К. HEFFERNAN 
(Department of Education, Mount Allison University, Canada) 


SUMMARY. A British observational study of science classroom interactions was replicated 
in Atlantic Canada with 33 teacher-class units. Observational data from the Canadian 
classrooms were clüster-analysed and two preferred. teaching styles emerged from this 
analysis. It was concluded that patterns of teaching in the Canadian classrooms studied 
differed substantially from those patterns observed in classrooms in the United Kingdom. 
An overall comparison of the results of the two studies revealed a less practical approach 
to science teaching in the Canadian classrooms observed, with greater emphasis on the 
informational aspects of science and convergent, problem-solving activities. 


INTRODUCTION 


Very little attention has been given to the influences of national characteristics on 
patterns of teacbing in science classrooms. Whereas some general inferences about 
the effects of different educational policies and learning environments can be drawn 
from the Comber and Keeves (1973) study, as Power (1977) observes, to date no 
systematic, cross-cultural, observational study of science classrooms has been under- 
taken. 


One factor, which has undoubtedly hampered attempts to compare the data 
provided by studies carried out in different countries, has been the tendency for 
research workers to concentrate on developing new observational systems for their 
own particular studies. Often new systems have been developed without due regard 
for existing systems which seek to quantify similar aspects of classroom life. 


Eggleston et al. (1976) have attempted some approximate comparisons of British 
science classroom interactions with the results of studies carried out in America, 
Australia and New Zealand. However, as they point out, the equivalence of the 
categories incorporated into the different observational systems used is in some doubt. 
Also, a number of teacher and pupil variables which might influence class interactions 
differ markedly from one study to the next and, inevitably, their conclusions are of a 
highly speculative nature. 


The Eggleston study cited provides a wealth of information about those intel- 
lectual transactions which occur in science lessons in the United Kingdom. The 
observational instrument employed in this study (Figure 1) was the Science Teaching 
Observation Schedule (STOS), developed by Eggleston, Galton and Jones (1975). The 
schedule is limited to cognitive behaviours which might be expected to occur in science 
lessons. It is not concerned with the affective or managerial aspects of science 
classroom behaviours. A sign system for recording data is used with a time-sampling 
unit of three minutes and each category is checked once if it occurs during this period. 


The main dichotomy into which observations are classified are those events 
initiated by the teacher and those initiated or maintained by pupils. Teacher talk is 
further sub-divided into three major categories: teacher questions, statements and 
directives, while pupil activity is divided into two main categories: pupils seek 
information or consult and pupils refer to the teacher. Each of these five major 
categories is sub-divided into minor categories, the basic pattern being interactions 
which are associated with: recall of facts or principles, problem-solving, hypothesising 
and experimental procedures. In this context the term ‘ problem-solving ’ is used in a 
convergent sense, where only one solution to the problem is acceptable to the teacher, 
One interesting feature of the system is that the observer’s attention focuses on the 
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FIGURE 1 


THE -SCIENCE TEACHING OBSERVATION SCHEDULE 


All observed events 





r 


Teacher talk Talk and activity 
initiated and/or 
maintained by pupils 


| 


Questions (or Statements Directions Pupils Pupils 
invitations to ! всек refer to 
comment) information teacher 
Маше of obsorVOr.. eee een nnn ew KEG ree залежах 


1. — TEACHER TALK 
Teacher asks questions (or invites comments) which are answered by: 


на 
p 








3 6 9 12151821 24 27 5055 





recalling facts and principles 


B 


applying facts and principics to problem solving 
making hypothesis or speculation 
designing of experimental procedure 


5 
он 


direct observation 


fo 


interpretation of observed or recorded data 


Bop 
за о су 0 


making inferences from observations or data 


lb. Teacher makes statements: 





bi of fact and principle bi 
b, of problems b, 
bs of hypothesis or speculation bs 
b 1 of experimental procedure b 4 


ic. Teacher directs pupils to sources of information for the purpose of: 





сі acquiring or confirming facts or principles сі 
c, identifying оғ solving problems с, 
Gs making inferences, formulating or testing hypotheses c 
cy seeking guidance on experimental procedure Cy 


2% ТАНК AND ACTIVITY INITIATED AND/OR. MAINTAINED BY PUPILS 
2d. Pupils seck information or consult for the purpose of: 





di acquiring or confirming facts or principles . d, 
d,  identifving or solving problems у d, 
d; making inferences, formulating or testing hypotheses d; 
d " seeking guidance on experimental procedure d 4 
2e. Pupils refer to teacher for the purpose of: 

сі acquiring or confirming facts or principles еі 
c, seeking guidance when identifying or solving problems бы 
о; secking guidance when infering, formulating or testing hypotheses e; 
e,  scehing guidance on experimental procedure e, 


В. С. Hacker, В. L. HAWKES and М. К. HEFFERNAN 53 


responses actually made by pupils and those cognitive skills which they are practising. 
А user's manual for the instrument and well developed training procedures for 
observers are available (Eggleston et al., 1975). 


The data collection by Eggleston et al. (1976) from 95 science classrooms con- 
stitutes the most comprehensive information about science class interactions which is 
currently available. A cluster analysis of this data had resulted in the construction of 
a typology of science teaching styles, whereby three distinct teaching styles were 
identified. 


The advanced stage of development of the STOS device as an observational 
instrument and the comprehensiveness of the data available relating to science 
classrooms in the United Kingdom led to a decision to see if this observational work 
might be replicated in a Canadian setting, matching schools, teachers and pupils 
across those variables which might be expected to influence class interactions. 


METHODS 


Sample 

Twenty-one schools located in seven school districts across the three Atlantic 
Provinces of Nova Scotia, New Brunswick and Prince Edward Island participated in 
the project. As in the British study, schools with resources for science which were 
considered so poor as to preclude certain teaching strategies were excluded from the 
sample. 


The distributions of number of years of teaching experience for the Canadian and 
British science teachers showed similar positive skews with median values of 8-7 years 
and 9-4 years respectively. Canadian grade nine classes were selected as corresponding 
to the fourth form level of the British classes observed. 


To match the British study, equal numbers of teacher-class units studying biology, 
chemistry and physics respectively were included in the total group of 33 units. This 
was considered to be an important matching factor because the British study had 
indicated that the science discipline taught could exert considerable influence on the 
patterns of classroom interactions. 


Procedure 

Some pilot studies were carried out with videotaped lessons taught by local 
science teachers to find whether the STOS instrument could be used in its standard 
form, or whether some modifications for use in Canadian schools would be necessary. 
Accepting the limitations of the instrument to cognitive behaviours, observers felt that 
the system provided an accurate profile of intellectual transactions as they occurred 
and that valid distinctions between teaching styles could be made with the standard 
instrument in Canadian classrooms. 


All five observers involved with the project were science graduates and also 
qualified science teachers. Initial training of observers was carried out during one full 
week of intensive training, using live and video-recorded lessons. At the end of this 
week reliability trials were carried out according to the procedures recommended by 
Eggleston et al. (1975). Five videotaped science lessons, including locally-produced 
materials, were used for these reliability trials. 


For each observer, an inter-observer reliability coefficient, Ra, was calculated for 
each of the 23 categories of the STOS, as described by Eggleston et al. (1975). Also а 
group reliability coefficient, Re, was calculated for each category according to the 
analysis of variance design of Medley and Mitzel (1958). 


The results of these reliability trials, which are provided in the Appendix, were 
considered to be very satisfactory. During observation periods in the schools brief 
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retraining sessions and further reliability trials were conducted to ensure that these 
high levels of observer reliability were maintained. 


Jnitial contacts with the schools were made via the superintendent of the school 
district. Principals were asked to co-operate in the research and to provide the names 
of science staff who would be willing to participate in the study. These teachers were 
asked to select six lessons which they felt were typical of their teaching. Visits to 
classes by different observers were randomised amongst the teachers to avoid sys- 
tematic observer errors in the results. Each teacher was observed for an average of 
between three and four hours and a minimum of three hours observation per teacher 
was required for inclusion of that teacher's data in the results. 


RESULTS 


For each teacher-class unit a percentage-use figure was calculated for each of the 
23 STOS categories. For example, if a certain teacher was observed for a total of 60 
three-minute time sampling units and use of category a, was recorded in 45 of those 
units, then that teacher's use of category a, was recorded as 75 per cent. 


Cluster analysis 

This percentage-use data was cluster-analysed in order to group the 33 teachers 
on the 23 STOS behaviours in such a way as to minimise within-group variations and 
maximise variation between groups. This process involves calculation of a measure 
of similarity between teachers across the categories and the formation of groups of 
teachers sharing similar patterns of behaviour with respect to the STOS categories. 
The groups which emerge from this analysis can be identified as distinct cognitive 
teaching styles. 


Everett (1974) discusses various forms of cluster-analysis, the choice of an 
appropriate technique being determined by a consideration of both the objectives of 
the analysis and the type of data to be analysed. The form of analysis selected for this 
study was that employed by Eggleston её al. (1976), involving firstly a single-link, 
agglomerative, hierarchical analysis with calculation of a distance coefficient as the 
appropriate measure of similarity between teachers. The data from this analysis was 
then used as a starting configuration for an iterative optimisation procedure described 
by Eggleston et al. (1976), to relocate any teachers who were poorly classified in the 
initial clustering process. 


The number of groups resulting from this analysis was determined by examination 
of the dendrogram resulting from the analysis, according to the procedures described 
by Everett (1974) and Eggleston е? al. (1976). 


Two distinct cognitive teaching styles emerged from this analysis and the frequen- 
cies of use of the 23 STOS categories by these two groups of teachers are compared 
in Table 1. The significance of differences in usage of the categories by the two groups 
was determined by the Mann-Whitney U test (Siegal, 1956). The results are given in 
the third row of Table 1. 


Mean percentage-use figures for the 23 STOS categories by the Canadian teacher- 
class units (IN = 33) are compared with corresponding figures for the British teacher- 
class units (Nadjusted =90) in Table 2. The significance of differences in usage of the 
STOS categories was determined by the Kolomogorov-Smirnov two-tailed test (Siegal, 
1956) and the results are given in the third row of Table 2. 


DISCUSSION AND CONCLUSIONS 


As with the British study, truly remarkable variations:in use of certain STOS 
categories were recorded from one teacher to the next. For'example, mean usage of 
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category b, for different Canadian teachers ranged from 12 per cent to 100 per cent. 


The two preferred teaching styles which emerged from cluster analysis of the 
Canadian classroom data are characterised as follows. 


Style A (№ =21) 

This teaching pattern is characterised by frequent use of a relatively small number 
of the STOS categories. A very high incidence of teachers’ statements of fact or 
principle and directions to sources of information to confirm facts or principles 
(categories b, and c, respectively) is mirrored by frequent pupil referral to the teacher 
for the purpose of confirming facts and principles (category ej). А paucity of other 
pupil task and activity and also of teacher questions, except those answered by recall 
of facts and principles, confirms the fact-acquiring emphasis of this group. This 
teaching style might be described as didactic, teacher-directed and theoretical. 


Style B (N 12) 

Distinctive features of this group include relatively high frequencies of teacher's 
questions which are answered by applying facts or principles to problem solving and 
questions answered by direct observation (categories a, and a; respectively). Higher 
levels of pupil initiated and maintained behaviour concerned with problem solving 
(categories d, and e;) confirm a shift in emphasis from acquisition of facts towards 
science as a problem-solving activity. This problem solving occurs in both theoretical 
and practical exercises, though more often the former. Low usages of categories аз, 
ад, bs, сз, а, and e, emphasise the convergent nature of these problem-solving activities. 


Substantial evidence for the validity of these clusters was derived from written 
reports provided by observers. At the time of observation each observer was asked 
to complete a detailed description of the lessons taught by a particular teacher, in 
terms of the general dimensions of teaching measured by the STOS instrument. These 
written reports gave close agreement with the subsequent classification of teachers 
according to the above typology. 


The biology teachers, in the sample observed, showed an overwhelming preference 
for Style A, whilst Style B proved to be more popular with teachers of chemistry and 
physics. Similar variations were reported in the British study cited and this shift in 
emphasis from facts and principles to problem-solving activities would seem to reflect 
the different stages of development of these science disciplines. 


A cross-cultural comparison 

Cluster-analysis of the data collected from British science classrooms had led to 
construction of a typology of teaching styles whereby three preferred teaching patterns 
were identified. The main features of these teaching styles are described by Eggleston 
et al. (1976) as follows: 


Style I(N —45) 

The initiative is held by the teacher who challenges pupils with a compre- 
hensive array of questions, observational, problem solving and speculative, in 
both practical and theoretical contexts. Teachers’ statements reflect orientation 
towards science as a problem-solving activity. 


Style IT (N 232) 

Characterised by a relatively infrequent use of teachers’ questions excepting 
those demanding recall and application of facts and principles to problem solving. 
'There is a high incidence of teachers' statements of fact and teachers' directions 
to sources of information for fact-finding. There is à non-practical bias and 
fewer transactions are inferential or speculative. 
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Style Ш (N=17) 
A more ‘ pupil-centred ° style, initiatives Бу: pupils occur more frequently. 
The work is relatively practical and the level of intellectual engagement is high. 


A comparison of the typologies provided by the two studies reveals that Style A 
and Style II have similar profiles for use of the STOS categories. Both are non- 
practical, teacher-directed styles with emphasis on the informational aspects of science. 
Whereas Style A is the preferred style of approximately two-thirds of the Canadian 
science teachers observed, Style П is the preferred style of roughly one-third of the 
British science teachers observed. 


Style B shares some common characteristics with Style I, the problem-solving 
style adopted by almost half of the British science teachers observed, in so far as there 
is a de-emphasis of knowledge of facts and principles in favour of problem solving. 
However, the convergent nature of the problem solving associated with Style B 
contrasts with the more speculative nature of Style I's problem-solving activities. 


Table 2 provides an overall comparison of the results of the two studies. A 
summation of the frequencies of teacher-directed behaviours (categories аџ_ л, b, 4, 
сі.) and pupil-directed behaviours (categories d, 4, e,.,) provides the familiar 
Flanderian teacher-dominated/pupil-initiated ratio. Respective mean figures of 64 per 
cent and 69 per cent teacher-dominated interactions for the Canadian and British 
classrooms are quite similar. 


It is interesting to note that along the heuristic-didactic dimension of teaching 
substantial differences in teaching patterns are apparent. Categories b,, c,, d, and ei, 
which are associated with a didactic approach, were all more frequently used in the 
Canadian classrooms, with percentage-use figures for category e, and categories b,, c, 
and d, being significan ly higher at the 0-01 and 0-05 levels respectively. Percentage- 
use figures for the convergent, problem-solving categories a, and d, were also signifi- 
cantly higher at the 0-05 level in the Canadian science lessons. Corresponding figures 
for all categories which are associated with heuristic teaching strategies (categories 
аз, Әз,ҙ Со.» Яз and ез) were lower in the Canadian classrooms, the differences for 
categories аз.., b, and d, and category сз being significant at the 0-01 and 0-05 levels 
respectively. A summation of the percentage-use figures provided in Table 2 shows 
that behaviours associated with heuristic teaching patterns occurred approximately 
three times more frequently in the British classrooms. These results confirm a more 
didactic approach to science teaching in the Canadian classrooms observed, with 
greater emphasis on acquisition of facts and principles and convergent problem- 
solving activities. 


All categories related to practical work (b4, c4, d4 and е) were used less frequently 
in the Canadian science lessons, percentage-use figures being significantly lower at the 
0-01 level for categories b, and e, and significantly lower at the 0-05 level for category 
d, From the percentage-use figures given in Table 2 it is probably safe to conclude 
that practical work, including demonstrations and class experiments, accounted for an 
average of between 10 per cent and 20 per cent of class time in the Canadian schools 
and ШЕТІНЕ between 30 per cent and 50 рег cent of class time in the British schools 
visited. 


In conclusion, a cautionary note about the generality of these findings may be 
necessary. While it is thought that the sample studied is representative of science 
teaching in the Maritime Provinces of Canada, for the selected teacher and pupil 
variables, it is quite unlikely that this sample provides a representative picture of 
Canadian science teaching as a whole. The wide geographical dispersion of the 
Canadian science teaching population, coupled with substantial provincial autonomy 
in educational decision making, suggests that far greater regional variations in teaching 
practices may be expected than in the United Kingdom. · 
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AN ASSOCIATION BETWEEN HIGH INTELLECTUAL 
ABILITY AND AN IMAGINATIVE AND ANALYTIC 
APPROACH TO THE DISCUSSION OF OPEN 
QUESTIONS 


By LYNN MICHELL* Амр В. D. LAMBOURNE 
(Department of Educational Psychology, University of Birmingham) 


ЗиммАву.—Ап experiment was designed to find out whether there were any quantitative 
and qualitative differences in the spoken discourse of 16-year-old pupils of * high ° and 
‘low’ ability in discussions of problems arising from textual material. Cognitive, 
linguistic and qualitative analyses of the discourse were carried out. Results from the 
cognitive-based method of analysis suggested that pupils in the ‘ high’ ability groups 
sustained discussion for longer, achieved a higher cognitive level of discourse and asked 
more interpretative questions. At a linguistic level, these pupils used more expressions 
of tentativeness and included more general referents in their discussions. A qualitative 
analysis of the discussions revealed that individuals who operated at the higher cognitive 
levels were able, within discussion groups, to achieve a more complex understanding of 
issues. 


INTRODUCTION 


RECENTLY there has been considerable emphasis on the importance of providing 
opportunities in the classroom for pupils to talk through problems and issues as a 
preparation for written work. However, it has not always been clear how informal 
discussion between pupils should be structured, and whether this activity was an 
intellectually profitable one for all pupils. An earlier investigation suggested that some 
pupils gained very little from these experiences since they seemed to rely heavily on 
cliché and anecdote, and found it difficult to sustain discourse at a critical, interpretive 
level (Michell and Peel, 1977). Until recently, methods of analysis for classroom 
discourse have usually been designed to monitor the formal class lesson and have 
focused on the linguistic strategies of the teacher and the class as a whole, rather than 
on the levels of understanding shown by individuals (Bellack er al., 1966; Flanders, 
1970; Coulthard and Sinclair, 1975). In contrast, the focus of the research presented 
in our previous study and in this paper has been the intellectual quality of pupil talk. 
Many writers from different research areas have acknowledged the existence of a 
hierarchy of thinking skills which culminate in complex, purposeful thinking, and 
have developed methods of describing this continuum through the analysis of 
children's behaviour and of their written and spoken language (Inhelder and Piaget, 
1958; Vygotsky, 1962; Bruner et al., 1966; Squire, 1964; Moffett, 1968; Tough, 
1977). Peel (1966, 1971, 1975а, 1975b) has developed a method of classifying the 
different levels of thinking that are associated with both age and intellectual maturity 
and his categories of Describer and Explainer thinking offered an appropriate starting 
point for a cognitive-based method of analysis for spoken discourse (Michell, 1976). 
In a pilot study (Michell and Peel, 1977), this method of analysis proved sensitive to 
the intellectual quality of pupils’ ideas in discussions with and without the teacher. 
The same method of analysis is used here to provide detail about the ways in which 
pupils of different intellectual ability think about, and talk about, ideas and issues 
arising from prose material. Previous research in adolescent judgment has empha- 
sised the relationship between intellectual ability and written judgments, but previous 
investigations have not included the analysis of spoken discourse. 


* Now at the Department of Child Health Research Unit, University of Bristol. 
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METHOD 


Outline 

In the experiment reported here, 16-year-old pupils wrote judgments in response 
to two different passages on the theme ‘ Road Accidents’, first without any prior 
preparation and second after talking through the questions on the passages in small 
friendship groups. These discussions were recorded and this allowed comparisons to 
be made between the discourse of pupils of ‘ high ' and ‘ low’ ability. The intellectual 
and interactional features of the discourse were analysed using the cognitive-based 
method of analysis, certain linguistic features of the discourse were measured, and a 
more detailed, qualitative analysis of the discussions was carried out. Pupils also 
completed three tests of ability, a test of general intelligence and two more specialised 
tests designed to measure a predilection towards more general and interpretative 
modes of thinking. The analysis of the written judgments and the effect of discussion 
on written judgments are not dealt with in this paper, and for these the reader is 
referred to Michell (1978b). 


Design of the test material 

Four passages on the theme ‘ Road Accidents ° were used as the basis for written 
judgments and as the stimulus for the discussions. Britton’s (1971) model of language 
functions provided the theoretical basis for the classification of the prose, and four 
categories were chosen to offer the clearest stylistic and linguistic differences between 
the passages: 


(1) Poetic: a short story, written for adolescents, about a newspaper reporter 
who goes to the scene of a bad accident. 

(2) Expressive: a transcript of a personal statement by an accident victim who 
was seriously hurt in a car crash which killed his girl friend. 

(3) Transactional-Report: a passage written in the style of a local newspaper 
report about an accident involving a coal lorry and two school buses. 

(4) Transactional-Analogic: a more technical piece about the causes of road 
accidents and a discussion of preventative measures. 


Five questions were designed for each passage along the lines established by 
previous research in this area (Peel, 1971, рр. 24-25). None of the questions could be 
adequately answered wholly from the information given in the passages. The four 
sets of questions are given in Michell (1978a). In a previous study, the written re- 
sponses of pupils to the four passages were compared (Michell, 1978b). Мо significant 
differences were found. The responses to all four passages were, therefore, combined 
into a single sample for the analyses reported here. 


The tests 
In addition to the main testing described above, pupils completed three tests: 


The AH4 Test of General Intelligence (Heim, 1970), This test was chosen because 
it is suitable for pupils of a wide range of ability. The test is divided into two parts, 
Verbal and Non-verbal, so that it is possible to use scores from each part separately 
for comparisons with other results. 


Peel’s Test of Generalising Ability (Peel, 1975c). A test which measures pupils’ 
tendencies to ‘ generalise’ and ‘ particularise’. For each item pupils are presented 
with three notions, for example charity, sympathy, and tolerance, and from a list of 
four alternative terms have to select the one which best expresses the overall meaning 
of the first three notions. These alternative items include a more specific notion 
(voluntary work), a more general notion (humanity) a notion at the same level 
(generosity), and a non-essential attribute (lacking in some people). The test consists 
of 20 such multiple-choice items and pupils scored one mark for each general notion 
chosen. 
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Response to Literature Test, ‘ Man by the Fountain’ (Young and Michell, 1977). 
This multiple choice test was developed because it was felt that Peel’s Test placed too 
great an emphasis on previously acquired vocabulary. Pupils have to choose from a 
series of statements about a short story the one which has most meaning for them. 
Each item consists of five such statements, each one representative of one of Peel’s 
judgment levels from Restricted to High-level Explainer. The test is similar to that 
developed by Ellis (1977). 


The sample 

Two hundred 16-year-old pupils (77 boys and 123 girls) were tested in two schools 
and two sixth-form colleges in the Midlands area. They were involved in CSE and 
GCE O-level and A-level courses, and represented a wide range of ability with scores 
on the AH4 Test of General Intelligence ranging from 23 to 122. Initially, 200 pupils 
took part in the experiment, all of whom completed the AH4 Test and Peel’s Test of 
Generalising Ability. Unfortunately, the absence rate was high in the college from 
which the largest sample had been drawn and so only a proportion of the original 
sample completed all the tests. 


TABLE 1 
NUMBERS OF PUPILS WHO COMPLETED THE TESTS 


АНА Test of General Intelligence 200 
Peel’s Test of Generalising Ability 200 
Response to Literature Test 150 
Written judgments: No preparation 188 
Written judgments: After discussion 166 


Recording the discussion 

Before writing their responses to one of the two passages, pupils talked through 
the questions in friendship groups (of three or four) for as long as they wished. AII 
the pupils involved in the experiment had been used to working in small groups and 
so the only unfamiliar aspect of the testing was the recording. However, the general 
behaviour of the pupils and the comments made during the recordings suggested that 
they settled well to this somewhat demanding situation. In general, the quality of the 
recordings was good. Where there was any problem hearing what was said (e.g. when 
pupils talked all at once throughout much of the discussion) or where there was any 
doubt as to which pupil made which utterance, the transcript was excluded from the 
final analyses. Out of the original 166 pupils who took part in the discussions and 
wrote responses afterwards, there were 105 pupils (34 discussion groups) whose 
transcripts were complete and accurate enough to be used in the statistical analysis. 
The pupils checked through their own transcripts after they had been typed to make 
sure that utterances had been ascribed to the right individuals. 


Coding of the written responses 

The basis for the categorisation of the pupils’ written responses was Peel’s 
distinction between Describer and Explainer judgments, with the precise method of 
categorisation that of Michell (1978a). Briefly, six categories were used which encom- 
passed the full continuum of judgments from illogical and trivial to analytic and 
imaginative: 

1. Restricted. (Scores 1): illogical, tautological and irrelevant responses. 

2. Describer. (Scores 2): responses where one piece of information was quoted 
from the text. 

: 3. Describer. (Scores 3): responses where more than one piece of information 

was quoted from the text. 

4. Low-level Explainer (Scores 4): responses where the focus was still on the 
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given information but where there was some additional comment or explanation. 

5. Explainer. (Scores 5): interpretative, analytic and imaginative responses with 
explicit reasoning. 

6. High-level Explainer. (Scores 6): exceptionally mature, sensitive and well 
argued responses. 


Analysis of the spoken discourse 

There were three main categories for spoken discourse: 

(1) Target discourse (excluding questions): statements and comments which 

related to the passage, the questions and to related ideas and experiences. 

(2) Target questions. 

(3) Interaction: discourse concerned primarily with social interaction between 

pupils. 

Target discourse was sub-divided into the same judgment levels that were used in 
the classification of the written responses, and the same continuum from Restricted to 
Explainer was identifiable in the spoken discourse. Target questions (equivalent to the 
teacher's Leads in the classification reported in Michell and Peel, 1977) did not span 
such a wide range of thinking and seemed to be of two types: (1) those which asked 
for clarification of specific points in the passage where a factual answer could be given 
(Describer), and (2) those which were concerned with interpretation and where there 
was no single, factual answer (Explainer). 


In the discussions involving 16-year-olds, less of the discourse was concerned with 
Interaction than the discussions of the 14-year-olds that were analysed in the pilot 
study, possibly because of the more structured nature of the test context, or possibly 
because of the age difference in the two samples. In the earlier study, six categories 
had been used to classify the Jnteractional discourse: Support, Task, Elicit, Request 
and Explanation, and Conflict. Here, the majority of the discourse fell into the first 
four categories: 

Support: comments which offered support to other members of the group or 
which showed recognition and acceptance of a previous remark. 

Task: comments which focused on the task of taking part in the discussion and 
of answering the questions. 

Elicit: appeals to individuals, or to groups as a whole, to answer the question or 
to offer an opinion on a previous remark. 

Aside: short comments made by pupils, more to themselves than to an audience. 


A summary of the main categories and sub-categories is given in Table 2. 


TABLE 2 
SUMMARY OF MAIN CATEGORIES AND SUB-CATEGORIES 








1. Target discourse 2. Target questions 3. Interaction 
1.1 Restricted (R) 2.1 Describer (QD) 3.1 Support(s) 
1.2 Describer (D) 2.2 Explainer (QE) 3.2 Task (t) 
1.3 Low-level Explainer (E1) 3.3 Elicit (e) 
1.4 Explainer (E) 3.4 Aside (a) 
1.5 High-level Explainer (E +) 





Linguistic analysis of the spoken discourse 

The spoken discourse was analysed for what it could reveal of the pupils’ styles 
of thinking in a discussion context in which they had to solve problems arising from 
prose material. From the language, inferences were made about pupils’ ability’ to 
explain, to interpret and to analyse. While it was beyond the scope of this research to 
provide a detailed linguistic analysis of the discourse, it was possible to select and 
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measure two linguistic features which seemed to be particularly relevant to the 
distinction between Describer and Explainer thinking: 


(1) Turner and Pickvance (1971) identified and measured certain indices of 
tentativeness in the spoken language of young children, and several of these were 
appropriate for the present analysis. The following indices were included in the 
comparison of ‘ high’ and ‘low’ ability groups: modal adjuncts ‘ perhaps’, * pos- 
sibly’’ and ‘ probably’, verbal auxiliaries * might be’, ‘ might have’, ‘ could be " 
* could һауе”, the disjunction ‘ ог’, and the conditional form ‘if... then’. 

(2) Peel (1975c) has developed a method of measuring the particularity or gen- 
erality of nouns and pronouns in a written text. Nouns and pronouns referring to par- 
ticular people, places or events were counted and classified as Particular in contrast 
to the General referents. The G/T Ratio used in the analysis was: 


The number of general nouns and pronouns 
The total number of nouns and pronouns 





For both linguistic analyses, the first two pages of the transcript from each discussion 
group were coded. 


RESULTS 


Scoring the tests, the written judgments and the spoken discourse 

On the three ability tests, it was possible for pupils to achieve the following 
maximum scores: AH4 Verbal (65), Non-verbal (65), Total (130); Test of General- 
ising Ability (20); and Response to Literature Test (60). For the written judgments, 
pupils could achieve a maximum score of 6 (High-level Explainer) for each question 
and a total score of 30 for each passage. Mean score per question and per passage 
are shown in Table 3. Ten measures were included in the analysis of the spoken 
discourse, both for individual pupils and for the discussion groups as units, and these 
are described briefly below: 


1. Length of discussion (discussion groups only). 

2. No. of utterances: the total number of scored utterances. 

3. Percentage Target discourse: the ratio of the number of utterances scored as 
Target discourse to the total number of scored utterances, expressed as a 
percentage. 

. Level Target discourse: the mean cognitive level (between 1 and 6) of utter- 
ances within the Target discourse. 

. Describer questions 

. Explainer questions 

. Support (Interactional category) 

. Task (Interactional category) 

. Elicit (Interactional category) 

10. Aside (Interactional category) 


Table 3 reports means and standard deviations for these principal variables. 


number per pupil 


MO CONAN A 


The relationship between intellectual ability and the nature of the written judgments and 
the spoken discourse 

Correlations between the pupils’ scores on the three ability tests, the cognitive 
level of the written judgments and the cognitive level of the Target discourse were all 
significant at the 0-1 per cent level. This result gave a first indication of a positive 
relationship between the quality of pupils’ spoken discourse and their intellectual 
ability. AH4 Total and Verbal scores correlated more highly than AH4 Non-verbal 
scores with the cognitive level of the discourse. Results are presented in Table 4. 


Inspection of the correlations suggested therefore that there was at least one 
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TABLE 3 
MEANS AND STANDARD DEVIATIONS 





Variable Mean SD 
АНА (Total) 81:5 207 
АНА (Verbal) 356 107 
АНА (Non-verbal) 45:8 116 
Generalising Test 161 2:83 
Response to Literature 4T3 7:88 
Written Judgments (по discussion) 173 4:03 
(Mean score per question) (3:46) 
Written Judgments (after discussion) 178 4:03 
(Mean score per question) (3°56) 
Level of Target discourse 2:25 0:48 
(N-105) 
TABLE 4 


PRODUCT-MOMENT CORRELATION MATRIX 


AH4 General- Response Written Written 








АН4 АН4 Non- ising to Judgment Judgment 
Total Verbal Verbal Test | Literature (no prep.) (after disc.) 
АНА Verbal 0:92 
AH4 Non-verbal 0:94 0°73 
Generalising Test 0°58 0°64 0°45 
Response to 
Literature 047 0:50 0:39 040 
Written Judgment 
(no prep.) 0:58 0:62 0:47 0:42 0:44 
Written Judgment 
(after disc.) 0°65 0:69 0:52 0:48 0°46 0:78 
Cognitive level 
(spoken disc.) 061 0°61 0:52 0:37 0:40 0:56 0:64 


common factor underlying the variables and that this factor was concerned with the 
intellectual ability of the pupils. This meant that two further statistical analyses could 
be carried out: 


(i) A more detailed and specific comparison of the discourse of pupils of ‘ high ° 
and ‘low ’ ability. 

(ii) A factor analysis to determine whether there were any other common factors 
which were not in evidence from the correlations alone. 


In order to look at the detailed differences in the spoken discourse, the sample of 
105 pupils was divided on a post-hoc basis into two broad bands of ability. Pupils’ 
scores on the AH4 Test of General Intelligence (Total) were placed in rank order and 
cut-off point was made at 83-5. For the analysis of the written judgments this gave 
two groups of 93 (* high ") and 95 (‘ low’), and for the analysis of the spoken discourse, 
which concerns us here, two groups of 49 (‘high’) and 56 (‘low’) with АНА total 
scores of 101-1 (SD =9-2) and 65-1 (12-8) respectively. 


Nine of the ten measures of spoken discourse (leaving out the length of discussion) 
were included in the comparison of the two groups of pupils. Results are given in 
Table 5. 


It was expected from the earlier correlation coefficient that there would be a 
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TABLE 5 
COMPARISON OF “ Шон” AND * Low ' Авплтү PUPILS 


Analysis of Variance 


Level of 








Variable Level Mean SD F-ratio Significance 

Number of Utterances high 25:9 158 

low 182 97 9:30 P<0-01 
Percentage of Target Discourse high 80-7 14:3 

low 712 141 1:66 NS 
Level of Target Discourse high 2°54 0:41 

low 2:00 0:39 49-4 Р<0:001 
Describer Questions high 0:47 0°78 

low 0:46 0°82 0:00 NS 
Explainer Questions high 0°86 1:57 

low 0:25 0:84 6:34 P«0:05 
Support high 2:20 2:55 

109 177 1:58 1-14 № 
Тазк high 2:33 3:60 

low 1552 1-70 2.25 NS 
Elicit high 0:18 0:49 

low 0:48 0:87 4°50 Р<0%05 
Aside high 094 i45 

low 0:89 145 0:03 NS 


significant difference between the groups for the mean level of Target discourse; an 
F-ratio at the 0-1 per cent level of significance for this measure therefore supported the 
earlier result. The comparison of the two groups further revealed that the more able 
pupils asked significantly more Explainer level questions and were able to discuss the 
questions at greater length (Number of Utterances) There was little evidence of 
differences in the Interactional categories, although the ‘ low’ ability pupils needed to 
prompt others to join in the discussions more frequently (Elicit). 


Factor analysis 

The earlier correlation matrix suggested that there was a common factor con- 
cerned with intellectual ability that determined the style of response in the written 
judgments and the spoken discourse. It was not possible to judge whether there were 
any other common factors underlying the pupils’ behaviour. A factor analysis was 
therefore carried out which included all the variables used in the statistical analyses of 
the written and spoken discourse. This factor analysis confirmed that there was a 
factor concerned with high-level intellectual ability (Factor 1), but it also produced 
good evidence for a second factor which was concerned with the style of the dis- 
cussions rather than with intellectual ability (Factor 2). This second factor was 
related to the first only to a very limited extent, the only real overlap being the Number 
of Utterances. Two principal factors had eigen values greater than unity and 
accounted for 48-2 per cent of the variance. "These factors were then rotated according 
to Kaiser's varimax criterion to simple orthogonal structure (see Table 6). 


Factor 1 appeared to be a high-level intellectual ability which involved a capacity 
to use language to interpret and explain (mean written judgments, level of Target 
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TABLE 6 
VARIMAX ROTATED FACTOR MATRIX 








Variables Factor 1 Factor 2 
Written Verbal Intelligence (AH4) 0-90 
Measures Non-verbal Intelligence (AH4) 0:72 
Peel Generalising Test 0°57 
Response to Literature 0:46 
Mean Written Judgment 0°80 
Spoken Number of Utterances 0:39 0-66 
Measures Percentage of Target Discourse -071 
Level of Target Discourse 0:73 
Describer Questions 
Explainer Questions 0:36 
Support 0:56 
Task 0:69 
Elicit 
Aside 0:68 
Percentage of variance accounted for: 287 19.5 
Loadings below 0:30 are omitted 


discourse, Explainer questions), and, to a lesser extent, a capacity to select general 
notions (Peel’s Generalising Test and Response to Literature). However, the factor 
was not concerned exclusively with verbal reasoning since it included the more general 
intellectual skills required for both Verbal and Non-verbal parts of the AH4 Test of 
General Intelligence. 


Factor 2: The second factor revealed a polarity between the relevant, purposeful 
discourse which concentrated on the passage and questions (Percentage of Target 
discourse) and discourse which was primarily concerned with social interaction. A 
pupil who scored high in this factor would talk at some length, but would be concerned 
with social relationships and with monitoring the task of discussion. He would be 
less concerned with ideas and argument arising from the text, would be effective in the 
social context but less effective intellectually. A pupil who scored low on this factor 
would, in contrast, limit his contributions to those that were directly relevant to the 
issues arising from the text. 


Comparison of the discussion groups as units 

So far in this statistical analysis of the spoken discourse, the scores of individual 
pupils have been compared either in the form of correlations, or when grouped 
according to the pupils’ abilities. These analyses have revealed a positive relationship 
between individuals’ intellectual ability and the style of their written and spoken 
responses. They have not, however, provided any information about the effects of 
combining small groups of pupils of roughly equivalent ability in a discussion context. 
In this section, the style of the discussions of groups who achieved a high cognitive 
level of Target discourse will be compared to that of groups who sustained a lower 
level of Target discourse. We shall, therefore, be able to find out what effect this new 
measure of spoken discourse had on the interaction as a whole. 


The 34 discussion groups were placed in rank order according to the mean 
cognitive level of the Target discourse achieved by the group. They were then divided 
into 17 * high’ (Mean 2-58, SD 0:22) and 17 ‘ low ° (Mean 1:95, SD 0:25) groups, with 
the cut-off point falling between 2:29 and 2-35. Nine of the ten measures (excluding 
level of target discourse) were included in the comparison of the groups. Results are 
given in Table 7. 
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TABLE 7 
COMPARISON OF HIGH SCORING AND Low SCORING Discussion GROUPS 





Analysis of Variance 
Level of 
Variable Level Mean SD F-ratio Significance 
Length of Discussion high 9:11 3:27 
(Minutes) low 574 2:01 123 P«001 
Number of Utterances high 81:9 39:0 
low 565 177 5:61 Р<0%05 
Percentage of Target Discourse high 807 88 
low 75:5 9:6 2:53 NS 
Describer Questions high 176 1:48 
low 112 118 1:87 NS 
Explainer Questions high 3:00 3:55 
low 0:29 0°57 9:07 P<001 
Support high 671 617 
low 5-47 3.90 0:46 NS 
Task high 7-06 8:62 
low 471 216 112 NS 
Elicit high 0:59 0:84 
low 1:53 1:61 418 Р<0:05 
Aside - high 3°76 422 
low 212 2:89 1°66 NS 


The pattern of differences which emerged from this comparison was identical to 
that which emerged from the comparison of high ability and low ability pupils. In the 
discussion groups where pupils were operating at a consistently high cognitive level, 
discussions were longer, individuals talked at greater length and asked more Explainer 
questions (53 compared to 5). In groups in which the discourse remained at a con- 
sistently low cognitive level, pupils needed to prompt others to join in the discussion 
more frequently (Elicit). The size of the discussion group had no significant effect on 
any of the measured variables. 


Linguistic analysis of the discourse 
The first two pages of each group’s transcript was used for the linguistic analysis. 


Measures of tentativeness 

The total number of words on the two pages was counted as well as the total 
number of expressions of tentativeness and the ratio which resulted was expressed 
initially as a percentage. Converted to decimals, this figure was used in a further 
comparison of the discussion groups. Results are given in Table 8. 


TABLE 8 
EXPRESSIONS OF TENTATIVENESS IN Hiron AND Low Discussion GROUPS 


Expressed as fractions 


Mean number of Total number of 
words on first expressions of Mean per 
two pages tentativeness group + SD No. 


high 6245 208 0.0196 +. 0:0086 17 
low 565.8 109 00116 + 00069 17 
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Analysis of variance 

The analysis of variance showed that there was a significant difference in the 
number of expressions of tentativeness between the high and low groups (Е —8-5, 
P«0-01). 


GIT ratio 

The mean G/T ratio (number of general nouns and pronouns/total number of 
nouns and pronouns) for the 17 ‘high’ and 17 ‘low’ groups was calculated 
separately. Means and SDs are given in Table 9. An analysis of variance showed 
that there was a significant difference in the means at the 5 per cent level. 











TABLE 9 
MEAN G/T Ratio ron Ни AND Low GROUPS 
Mean G/T Ratio Analysis of Variance 
high 0:58 


F-434 Р-<005 
low 041 


When the mean G/T ratios of the five ‘ highest’ and five ‘ lowest’ discussion 
groups, selected from the initial rank ordering, were compared, there was a difference 
between the groups at the 0:1 per cent level of significance (Table 10). 


TABLE 10 
МЕАМ G/T Ratio кок 5 HIGHEST AND 5 Lowest Groups 
Mean G/T Ratio Analysis of Variance 
5 highest 0:66 
5 lowest 0:43 





Е-239 P=<0001 


Qualitative analysis of the discourse 

Beyond the results of the statistical comparisons of the discourse of the pupils, 
convincing evidence for the relationship between intellectual ability and pupils’ modes 
of thinking in the present research emerged from a qualitative analysis of the spoken 
discourse (Michell, 1978a). This close analysis of the transcripts was restricted to the 
10 discussion groups who achieved the highest and lowest scores for the level of 
Target discourse. This gave five ‘ highest’ and five ‘lowest’ groups. These groups 
did represent the extremes within the sample, but for this reason the discourse offered 
some very clear contrasts. From the qualitative analysis it was clear that individuals 
who were operating fairly consistently at an Explainer level of thinking were able, as 
a group, to achieve a structure in the ongoing discourse and a complex understanding 
of issues which were not available to pupils who operated at the lower levels. 


There were three central differences between the two groups which ensured that 
pupils in the ‘ high’ group were at a constant advantage over the pupils in the ‘ low’ 
group. These pupils were able to sustain discussion by offering longer, more extended 
responses, by building upon one another’s contributions, and by introducing into the 
discussion a wide range of new and relevant knowledge and information. They were 
able to focus attention on key themes and issues by constantly monitoring the dis- 
cussion, by using Explainer questions to ensure that only relevant arguments were 
included, and by admitting anecdotal material only if it was relevant to the more 
general themes of the passage. A few pupils in the ‘ high’ groups offered comments 
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which served the function of synthesis in that they summarised arguments that had 
gone before, and weighed various possible viewpoints or solutions. In terms of 
interaction, pupils in the ‘ high’ groups had less need to use Elicit strategies since 
individuals entered into and sustained the discussion more easily and spontaneously. 


DISCUSSION 


The results of the comparison of the ‘ high’ and ‘ low’ ability pupils, and of the 
discussion groups who achieved high and low scores for the cognitive level of the 
Target discourse, offer evidence of a positive relationship between the intellectual 
quality of the pupils’ spoken discourse and their intellectual maturity. Admittedly 
this statement needs some qualification and this will be discussed first. 


(1) The present state of knowledge about intelligence tests makes it very difficult 
to define in precise terms the intellectual abilities that are being measured by such tests 
and one must therefore be cautious about making claims about ‘ intelligence’ as a 
single, global attribute. However, the factor analysis suggested that there was a single 
factor, presumably intellectual, which largely governed the pupils’ performance on the 
ability tests, their written judgments, and the cognitive level of their discourse. 


(2) There are strong arguments against making judgments about children’s 
thinking on the basis of a single sample of their written or spoken language. As Labov 
(1969) has demonstrated, differences between test contexts can affect the language used 
by the subject. If pupils had said very little in the discussions (as was the case in 
Labov's example), one would not have been justified in concluding that they were not 
capable of using language in the ways demanded by the discussion context, but this 
was not the case. Those pupils whose discourse was analysed did respond in a 
particular style and at a fairly consistent cognitive level, and these were the features 
that were measured. 


The correlations between the level of Target discourse and the ability test scores 
suggested a relationship between these two variables and this was later confirmed by 
the comparisons of the ‘ high’ and ‘low’ ability pupils and by the factor analysis. 
There was a significant difference at the 0-1 per cent level in the cognitive level of the 
Target discourse between the ‘high’ and ‘low’ ability pupils, and the discussion 
groups who achieved a high cognitive level of discourse asked a total of 52 Explainer 
level questions compared to 5 in the ‘low’ groups. There was no similar difference 
between the groups for the number of Describer questions. This was not surprising 
since there had to be a basic agreement about the events reported in the passage before 
any interpretative comments could be made. At times, therefore, all the groups 
needed to focus closely on the text, clarifying points of conflict before continuing with 
the discussion. However, at this point, the consensus of opinion signalled the end of 
the discussion for many of the ‘low’ groups, while for the other pupils, especially 
those in the ‘ high ' groups, clarification of a point in the text was only a starting point 
for fruitful questioning and analysis that followed. 


In an earlier study (Michell and Peel, 1977) a relationship was found between the 
kinds of interactional strategies used most frequently by the pupils and the level of 
Target discourse achieved by the group. Support, for example, was associated with 
Explainer level discourse. In the present research, however, there was a more or less 
consistent pattern of interaction for all the groups. Possibly the slightly different test 
context was responsible for this result since the questions directed the discourse more 
effectively and there was not the same need to negotiate the direction of the discussion. 
In addition the pupils in the present study were older; possibly this meant that, 
irrespective of their intellectual ability, they were mature enough to handle the social 
relationships within the groups with confidence and without open conflict. 


The results from the linguistic analyses added further support to the conclusion 
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that the intellectually able- pupils approached the discussion in a more open way and 
were able to view the given information in a wider context of more general knowledge. 
Pupils who sustained discussion at the higher cognitive levels used more general 
referents and clearly this was related to the fact that their discussions were not limited 
to specific events, individual examples and personal anecdotes. Their tolerance of 
ambiguity and their resistance to offering prematurely closed solutions were reflected 
in the use of more expressions of tentativeness. 


Results both from this investigation and from our earlier study (Michell and 
Peel, 1977) make it doubtful whether, at least for the less able pupil, peer group 
discussions are an effective method of preparing for written work. This is not to say 
that there is no place for this kind of learning context in the classroom, but rather that 
its limitation as a context for learning should be recognised. The distinction between 
Describer and Explainer thinking has made explicit some of the differences that can be 
observed in the discourse of pupils of different intellectual ability. Some pupils 
undoubtedly benefit from informal discussions since they can introduce new ideas and 
relevant information, sustain discussion, consider and evaluate alternative possibilities 
and view specific events within a more general context. However, less able pupils 
probably cannot carry out these strategies for themselves. The teacher's role, there- 
fore, will be to provide some structure within the discussions and to help pupils, 
particularly those of lower ability, to achieve some understanding at more interpretive 
and imaginative levels. In addition, he may need to guide those pupils who score 
highly on our factor 2 towards a concentration on issues arising from the text rather 
than on social exchanges. Whether practice in this kind of guided discussion could 
eventually bring about persistent improvements in pupils! modes of thinking, and in 
the intellectual strategies they could draw on, has yet to be tested. 
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REMOVING THE MARKS FROM EXAMINATION SCRIPTS BEFORE 
RE-MARKING THEM: DOES IT MAKE ANY DIFFERENCE? 


By R. J. L. MURPHY 
(Associated Examining Board Research Unit, Aldershot) 


SUMMARY. Two senior GCE examiners re-marked photocopies of the same two hundred GCE 
examination scripts. One hundred of these scripts still had the marks and comments of the 
original examiners on them, whereas these were removed from the other one hundred. Com- 
parisons were made between the marks awarded by the two examiners to these scripts and the 
original marks and it was found that removing previous marks and comments made a consider- 
able difference to the extent to which these three sets of marks agreed. 


INTRODUCTION 


There are many situations where examination scripts are marked by one examiner and then 
re-marked by another examiner. One examiner may be checking on the marking standards 
of the other examiner (Black, 1962), or else the marks of the two examiners may be averaged. 
in order to attempt to produce a more reliable assessment (Wiseman, 1949; Wood and 
Quinn, 1976). It has been suggested by Pilliner (1965) that one of the critical factors which 
affect the re-marking of scripts is whether or not the second examiner is aware of the marks 
awarded by the first examiner. In fact, Pilliner suggests that if the second examiner is aware 
of the marks awarded by the first examiner then this invalidates the independence of the two 
assessments of the script. Furthermore, an impression has been gained from re-marking 
investigations (e.g. Murphy, 1978) that more extreme differences in marking standards are 
revealed when previous marks and comments are removed from scripts. It would seem that, 
however much an examiner tries to ignore the judgments of a previous examiner when he is 
re-marking scripts, his own impression of the scripts is bound to be influenced. 


The aim of this investigation was to test this view by comparing the results of re-marking 
two sets of scripts, one set with previous marks and comments on them and the other set 
with these removed. 


METHOD 


Two systematic samples of 100 scripts were selected from the scripts of all the candidates 
entering for a General Certificate of Education (GCE) O-level examination. The scripts 
related to an examination paper in which the candidates had been required to answer four out 
of a selection of essay questions in two hours. These two samples were systematically 
selected in that each contained approximately equivalent numbers of candidates from the 
allocations of each of the examiners who had originally marked that examination. 


One of these samples of scripts had all previous marks and comments removed from them, 
and the two sets were then photocopied to provide a duplicate copy of each script. Two 
senior GCE examiners were then each sent a complete set of the 200 scripts and were asked 
to re-mark them, “ as though they were being marked for the first time ". They were also 
asked to follow the same marking instructions as had been used in the original marking of the 
examination. 


The results of these re-markings were then compared to see whether the marking of the 
two examiners was affected in any way by the presence of previous marks and comments on 
half of the scripts. Three major comparisons were possible, in terms of the extent of the 
agreement between the two re-marking examiners with each other and in terms of the agree- 
ment of both of them individually with the original marks which had been awarded to the 
scripts (i.e. by the normal marking procedures). 


RESULIS 


The results compare three marks awarded independently to each of the 195 scripts (five 
of the original 200 had to be excluded as marks for these scripts were not received from one 
of the examiners in the experiment). Of these 195 scripts, 95 were re-marked by the two 
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TABLE 1 
CORRELATIONS BETWEEN ORIGINAL MARKS AND Two RE-MARKS 





Original Marks Original Marks Re-Mark 1 
v Re-Mark 1 y Re-Mark 2 y Re-Mark 2 


Marks-on Sample (N =95) 0:94 0:96 0:94 
Marks-off Sample (N =100) 0:86 0:85 0:87 











examiners with the previous marks and comments on them, and 100 were re-marked after all 
previous marks and comments had been removed. Table 1 gives Pearson's product moment 
correlation coefficients for the three sets of marks for the two samples of scripts. 


This table provides a very striking confirmation of the original hypothesis that the 
marking of the two examiners would be affected by whether or not previous marks and 
comments had been removed from the scripts. The * original marks ' are the marks awarded 
to these scripts by the team of examiners who marked them as part of the normal examination 
marking procedures. * Re-Mark 1° and * Re-Mark 2” refer to the marks awarded by the 
two senior examiners who re-marked the scripts. The difference between the correlation 
coefficients for scripts marked with marks on and marks off in each of the three cases is highly 
significant (Fisher's Z, P« 0-01, in each case). 


Correlation coefficients, and in particular differences between correlation coefficients, can 
in certain circumstances give a somewhat misleading impression of the true nature of the 
relatedness of sets of scores (Wood, 1978). Table 2, however, shows that these correlation 
coefficients are based on sets of scores with similar means and standard deviations. 


TABLE 2 


MEANS AND STANDARD DEVIATIONS OF THE MARKS AWARDED 
TO THE Two SETS ОЕ SCRIPTS 


Marks-on Sample Marks-off Sample 








(N=95) (N =100) 
Mean SD Mean SD 
Original Marks 41-6 11:5 42-4 130 
Re-Mark 1 411 111 407 128 
Re-Mark 2 412 114 4155 124 


Taking into consideration the similarity of the means and standard deviations shown іп 
Table 2, it would seem justifiable to assume that the differences in the correlation coefficients 
are due to a general increase in agreement between the two re-marking examiners’ marks and 
the original marks in those cases where the re-marking examiners were aware of the original 
marks. In addition, leaving the original marks and comments on some of the scripts would 
appear to have produced greater agreement between the two re-marking examiners, who, by 
being influenced by these earlier judgments, tended to agree more closely with each other. 
The exact nature of the evidence used to support this conclusion can be seen in Figure 1, 
which shows two scatter diagrams illustrating the extent of the agreement between the two 
re-marking examiners for the marks-on and marks-off samples. 


Further evidence may be found in Figure 2, which illustrates the differences between the 
average mark changes which occurred for scripts re-marked with and without original marks 
and comments still on them. These are shown between the three different sets of marks for 
the two samples of scripts. The average mark change is calculated by adding the gross 
change in marks (regardless of whether this is an increase or a decrease) for each of the 
candidates, and then dividing the total by the number of candidates. 


Figure 2 (a) demonstrates that, when the first senior examiner re-marked these scripts, 
his marks tended to agree more closely with the original marks in the cases where these marks 
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FIGURE 1 


SCATTER DIAGRAMS ILLUSTRATING MARKS AWARDED BY Two SENIOR EXAMINERS 
TO THE MARKS-ON AND MARKS-OFF SAMPLES OF SCRIPTS 


Marks-On Sample (N=95) 
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still appeared on the scripts. Where the original marks and comments had been removed 
from the scripts, there were much greater variations between the marks which the first senior 
examiner awarded and the original marks. 


Figure 2 (b) indicates that exactly the same effect occurred when the other senior examiner 
marked the scripts. Again, the greatest variations occurred when he was unaware of the 
original marks awarded to the scripts. 


Figure 2 (c) permits a comparison to be made between the marks awarded to the scripts 
by the two senior examiners. Here, it can be seen that the variations between their judgments 
were considerably reduced when they were marking scripts with the original marks and 
comments still on them. This is clearly the result of the phenomenon illustrated in (a) and 
(b), where it was shown that these two individual examiners were tending to modify their 
judgments of the scripts in the direction of the original marks which appeared on them. 
Because they were both modifying their judgments towards the same set of original marks, it 
is inevitable that the variations between their marks for those scripts were less than for those 
scripts where the original marks and comments had been removed. 


Thus, the differences caused by removing the original marks and comments from the 
scripts are clearly demonstrated in each of these three cases. The differences between the 
results shown in Figure 2 for the two samples of scripts are again highly significant in each 
case (t test, P<0-01, for all three pairings). These findings show that differences between 
examiners’ judgments will be much more apparent if they are given clean scripts to re-mark, 
rather than scripts bearing previous marks and comments, 


DISCUSSION 


On the whole, the results of this investigation appear to provide conclusive evidence that 
removing previous marks and comments from scripts does make a considerable difference. 
It would seem that examiners who are asked to re-mark scripts cannot help but be influenced 
by these previous judgments, however much they try to ignore them and form their own 
opinion. 

The only possible doubt which could be raised about these otherwise extremely clear-cut 
findings is that the scripts which had the marks removed from them were a different set of 
scripts from the set which had the marks left on them. The two sets of scripts represent two 
systematic samples which we need to be able to assume were at least equivalent, in that 
neither was more likely to produce unreliable marking than the other. Support for this 
assumption can be found in the data from other re-marking investigations (e.g. Murphy, 
1978) where a number of samples of 200 scripts produced very similar degrees of marking 
unreliability when they were divided up into two smaller samples of 100 scripts. Thus, we 
would contend that two samples of 100 scripts which have been drawn in a fairly arbitrary 
way (either randomly or systematically across examiners) in normal circumstances could be 
expected to produce reasonably equivalent degrees of marking unreliability. As the findings 
of this investigation were a long way away from being equivalent for the two different 
samples, it must be assumed that these different results were largely the effect of removing the 
previous marks and comments from one sample and not from the other sample. 


Thus, if a script is to be re-marked and the intention of having the script re-marked is to 
obtain a second unbiased assessment of that script, then it is essential that previous marks 
and comments are removed from that script before it is re-marked. Where previous marks 
and comments are not removed from a script, these are likely to influence considerably the 
judgment of the examiner re-marking the script in the direction of the marks awarded by the 
original examiner. 


This conclusion would appear to be applicable to both of the situations, mentioned in 
the introduction, where examination scripts are marked by more than one examiner. Whether 
the additional examiner’s mark is to be used as a check on the marking standards of the first 
examiner, or whether it is to be combined with it as in the case of multiple marking pro- 
cedures, it would seem to be necessary to obtain from the examiner a mark which is unbiased 
by the previous mark awarded to the script. 
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THE EFFECTS OF CONCEPTUAL STYLE PREFERENCE, RELATED 
COGNITIVE VARIABLES AND SEX ON ACHIEVEMENT IN 
MATHEMATICS 


By D. A. ROACH 
(College of Arts, Science and Technology, Jamaica, West Indies) 


SuMMARY. The Conceptual Style Test, a mathematics achievement test and an intelligence test 
were administered to grade 6 children, 206 boys and 212 girls, in five urban Jamaican elementary 
schools. Mathematics achievement had significant positive correlations to analytic conceptual 
style and intelligence; girls had higher mathematics achievement than boys. Analytic conceptual 
style had a significant positive correlation with intelligence, but had no relation to sex. When 
intelligence was partialled out, the relation between conceptual style preference and mathematics 
achievement became non-significant. 


INTRODUCTION 


This paper seeks to examine the effects of conceptual style preference, sex and intelligence 
on mathematics achievement. It will also examine what effects sex and intelligence have on 
the relation between conceptual style preference and mathematics achievement. While this 
investigation seeks to extend earlier studies, it also examines whether the findings of studies 
on conceptual style preference from other cultures are supported in a Caribbean culture. 


Conceptual style preference is a dimension of cognitive style which refers to the type of 
conceptual relationships among objects typically formed by the child (Kagan, 1966). Kagan 
et al. (1963) identified three cognitive styles based on subjects! grouping of common pictorial 
stimuli. The descriptive-analytic style is based on the similarity of a part of each stimulus 
(e.g. the ears on faces). The inferential-categorical style is typified by some inferred quality 
or language convention; each stimulus being an independent example of the conceptual label 
(e.g. * women °, * animals °). The relational-contextual style is typified by the association of a 
whole stimulus with another whole stimulus on the basis of functional relationships which 
may obtain between them (e.g. the woman wears the dress). 


Lee et al. (1963) obtained data which indicate that the analytic child tends to search for 
elements of similarity between objects which are embedded in larger stimulus contexts. 
Their initial response usually is to ignore those properties of objects which involve the whole 
stimulus. In other words, their strategy of choice is to divide the total stimulus and look for 
sub-elements that share a common characteristic. 


The non-analytic child examines initially the attributes which involve the entire stimulus, 
especially their functional characteristics (i.e. how each object might relate, as a whole, to 
another object). A much weaker conceptual habit for non-analytic children is the search for 
elements of similarity within the whole. 


Subjects who have been found to be analytic on the cognitive style test (CST) seem set to 
attend to more factual detail during concept acquisition (Kagan ег al., 1963), are superior to 
those with a non-analytic style in learning concepts based on objective similarity of detail 
among visual stimuli (Lee et а!., 1963), and score higher on performance tests than on verbal 
tests (Kagan et al., 1964). Conversely, those found to have a non-analytic cognitive style 
Score better on verbal tests than on performance tests, and learn functional relationships 
better than those of analytic style (Kagan е? al., 1963, 1964). 


A conceptual style test (CST) was employed by Kagan and his associates (1963) to 
identify these response preferences. The test was designed specifically to suppress the 
inferential-categorical responses, so it is misleading to dwell on data for this category; thus 
the analytic and relational responses can be regarded as if they were on opposite ends of a 
continuum (Lee et al., 1963). This study will, therefore, follow recent trends in using Kagan’s 
concept of style in a bicategorical system, analytic versus non-analytic style. 


Satterly (1968, 1976) found that for 7- to 11-year-old children, those who were analytic 
in preference scored significantly higher in mechanical arithmetic, Robinson and Gray 
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(1974) found that for fifth grade children, on the Mathematical Concepts subtest of the Iowa 
Test of Basic Skills, among the three conceptual style preferences the inferential style was the 
most important predictor of performance for boys and girls. However, on the Mathematical 
Problems subtest, the analytic style was the most important predictor for both sexes. 


In the Robinson and Gray study (1974) significant positive correlations were obtained 
between intelligence and each conceptual style preference. Satterly (1976) found a correlation 
between analytic style and verbal intelligence of 0-10 (not significant) for 10- to 11-year-old 
boys; for general ability and analytic style it was 0-01. 


A study by Isaacs (1974) of factors related to the performance in mathematics of third- 
year pupils in Jamaican secondary schools found a significant correlation between mathe- 
matics achievement and verbal intelligence (r 20-80) and between mathematics achievement 
and non-verbal intelligence (г 20-63). Reid (1964) obtained a correlation (r 20:85) between 
intelligence and attainment in arithmetic and English for grade 6 Jamaican school children. 


Robinson and Gray (1974) found a significant relation between the Mathematical 
Problems subtest of the Iowa Test of Basic Skills and an analytic conceptual style for both 
sexes, after the effects of verbal and non-verbal intelligence were partialled out. The correla- 
tion between inferential style and school learning for girls was substantially reduced after 
1212 attributed to verbal and non-verbal intelligence was removed from school-learning 
variables. 


Thompson (1969) found that for both urban and rural mixed schools in Jamaica there 
was no sex difference in the performance of boys and girls in the General Certificate of 
Education (GCE) examinations. Isaacs (1974) and Vernon (1961b) found that there was no 
et difference in mathematics achievement of boys and girls in Jamaican secondary 
schools. 


Satterly (1968) administered a test of preference for analytic versus synthetic attitudes to 
asample of 200 children aged 7-11 years. For the entire sample he found that boys tend to be 
more analytic than girls when the task demands it: the difference reached significance 
(P« 0-01) among the 9-11 age group but not among the 7-9 age group. Оп a sample of 7- 
and 8-year-old children, Ostfield and Niemark (1967) found no significant sex difference in 
analytic conceptual style. However, Stanes (1973) found boys were significantly more 
analytic than girls at the 6-year-old age group. 


The following hypotheses are therefore proposed: 

(1) There is a significant positive correlation between analytic conceptual style and 
mathematics achievement. 

(2) There is a significant positive correlation between analytic conceptual style and 
intelligence. 

(3) When intelligence is partialled out there is a significant correlation between con- 
ceptual style preference and mathematics achievement. 

(4) Boys are more analytic in conceptual style than girls. 

(5) Boys attain higher scores in mathematics achievement than girls. 


METHOD 


Sample р 
Five urban elementary schools in Jamaica were randomly selected and from them 206 
boys and 212 girls were randomly chosen from grade 6. 


Materials 

Conceptual style preference: The Conceptual Style Test consisted of 19 items, each item 
consisting of 3 pictures. The pupil’s task was to pick up two pictures that were alike in some 
way and then clearly to state the reason for picking the two pictures. 


Intelligence: The test used was Reid’s Mental Test (Reid, I-J7-72), which has well 
established norms for Jamaican students. It consists of two parts. Part I is a verbal test 
(70 items) and Part II is a non-verbal test (30 items). The sample was found to be slightly 
above the national average in mental ability. 


Mathematics Achievement Test: The mathematics achievement test used is Reid’s 
Arithmatic Test (Reid, 1HS 12-73) which also has well established norms for Jamaican 
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students. It consists of two parts. Part I is mainly mechanical arithmetic but also consists 
of some algebra. Part П is mainly problem arithmetic but also involves some algebra and to 
a lesser extent geometry. 


The sample fell below the national average in mathematics achievement, even though it 
was above the national average in mental ability. 


Reliabilities:: The Kuder-Richardson reliability coefficients for the instruments were: 
Conceptual Style Test 0-86; Mental Ability Test 0-95; and Mathematics Achievement Test 
0-97. 


RESULTS 


Hypothesis 1 

The correlations between analytic conceptual style and mathematics achievement were 
0:37 overall, with values of 0:38 for boys, and 0-36 for the girls; the association between 
analytic conceptual style and mathematics achievement is thus supported by this analysis. 


Hypothesis 2 

The correlations between analytic conceptual style and intelligence score were 0-51 
overall, with values of 0-54 for boys and 0:47 for girls. These findings support hypothesis 2 
(Р< 0-1 in each analysis). 


Hypothesis 3 

When scores on the intelligence test are controlled, the partial correlations between 
analytic conceptual style and mathematics achievement were —0-05 overall, with values of 
— 0:06 for boys and -0-03 for the #115. None of these correlations is statistically significant 
and thus hypothesis 3 is rejected. 


Hypothesis 4 

The mean scores and standard deviations for boys and girls on the conceptual style test 
were: boys 9:04 (5-47); girls 9-65 (5-20) (t=1-72, NS). There is thus no evidence of a sex 
difference in conceptual style. 


Hypothesis 5 

'The mean scores and standard deviations for boys and girls on the mathematics achieve- 
ment test were: boys 19-99 (18-1); girls 25-10 (19-8) (t 22-76, P<0-01). Hypothesis 5 is thus 
rejected. The girls scored significantly higher than the boys on the mathematics achievement 
test. 


DISCUSSION 


The performance of tasks in mathematics often requires the student to pay attention to 
detail (Hamza, 1951). .Significant words such as ‘ consecutive’, * squared’ etc. which the 
less analytic student may tend to overlook must be recognised in order to perform the relevant 
mathematical tasks. Thus the student with a relatively analytic conceptual style tends to have 
an advantage over his less analytic classmates in performing mathematical operations. 


The correlations found between conceptual style preference and intelligence vary from 
study to study. In Satterly’s study (1976) it was not significant. This suggests that individuals 
with an analytic conceptual style preference are not superior to those with a non-analytic style 
on intelligence; but rather that the forraer do better on intelligence tests which are loaded on 
those items which require paying attention to detail. Further investigation on the xelationship 
between conceptual style preference and the factors of intelligence is suggested. 


The finding that there is no relationship between conceptual style preference and 
mathematics after partialling out scores from the intelligence test suggests that there is a great 
overlap between the two tests. This may in part be a reflection on the nature of the intel- 
ligence test; it could be no surprise to suggest that items in such a test required analytic 
ability, but it may be more surprising to find that a similar type of analytic ability is apparently 
required by the two, very different tests. 


Girls are usually more docile than boys (Goodenough, 1954) and consequently less likely 
to explore and analyse their environment; this would suggest that they would be less analytic 
than boys. However, their tendency to mature faster than boys (Wisenthal, 1965) may help 
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to explain why in pre-adolescence, the girls are as analytic as the boys in conceptual style 
preference. 


On the poorer achievement of the boys in mathematics achievement, Vernon (1961a) 
suggests that it could be that boys in the Jamaican society resent the dominance of females 
at home and again at schools, and this has a negative effect on their academic achievement. 
Of the thirteen classrooms visited in this study only one had a male teacher. In some homes, 
Epstein and Radin (1975) hypothesised that academic activities are usually considered 
feminine and therefore inappropriate for boys. Thus boys compared with girls of lower 
occupational level may perform more poorly on tests of academic achievement. Wisenthal’s 
(1965) hypothesis is also tenable, namely that in the pre-adolescent years girls mature earlier 
than boys. The girls at a more mature level, one would expect to be more able to perform 
mathematical tasks such as problem-solving; the maturer person would also be expected to 
have an attitude that facilitates learning and performance on mathematical tasks. These 
three factors may in part explain the differences in mathematics achievement of the sexes. 
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SOCIAL CLASS AND MEMORY FOR CATEGORISED AND 
UNCATEGORISED LISTS 


By N. E. WETHERICK, SHIRLEY PATERSON anp CLARE DALLAS-ROSS 
(Department of Psychology, University af Aberdeen) 


SuMMARY. Two experiments suggest that Jensen may be wrong in supposing that middle class 
children's superiority in the recall of categorised lists is evidence of a permanent middle class 
superiority in Level Ц ability. Categorised and uncategorised lists of single-syllable words were 
given to 176 children aged 15, 11, 8 and 6 years. Four-way analysis of variance of the number 
of words remembered indicated that Jensen's findings may indicate only a transitory develop- 
mental phenomenon arising from the greater emphasis on verbal interaction in middle class 
families and the faster rate of verbal development in girls than in boys. 


INTRODUCTION 


Jensen (1972) proposed two levels of mental ability. Level I involved the * registration 
and consolidation of input and the formation of associations' and could be tapped by tasks 
requiring output of the material presented without conceptual elaboration. Level П involved 
‘abstraction and conceptual elaboration and transfer’ and could be tapped by tasks in 
which conceptual elaboration of input was required. He further hypothesised that Level I 
ability was distributed independent of social class whereas Level II ability was more likely 
to be found in the middle class than in the lower social classes. Jensen and Frederiksen 
(1973) provided partial support for this hypothesis using a task in which 20 items were 
presented in sequence for subsequent free recall. In one condition the items were unrelated 
and recall consequently required only Level I ability. In another the 20 items consisted of 
five from each of four conceptual categories and Level II ability could be brought to bear. 
Middle and lower class groups performed equally well in the first condition but middle class 
сше were superior іп the second and showed greater evidence of conceptual clustering in 
recall. 


Jensen believes that middle class superiority on categorisable material is life-long and 
arises from superior (Level H) ability to organise items presented in random order into 
appropriate categories but does not make clear whether he thinks this organisation is done 
during input or during output. Wetherick (1975; see also Wetherick and Alexander, 1977) 
has, however, shown that conceptual organisation during input is an important factor since 
more items are recalled if they all come from one category than if they all come from different 
categories although, in both cases, conceptual elaboration during output is neither required 
nor possible. 


Social class differences encountered in Scottish local authority schools are unlikely to be 
as extreme as those in Jensen's subjects (where they coincided with black/white racial differ- 
ences) but it is of some interest to see whether any such differences can be detected. The 
following experiments presented six lists of eight single-syllable words at the rate of one word 
per second and required immediate free recall at the end of each list. Three of the lists 
consisted of words drawn from one conceptual category, and three of words drawn from eight 
different categories. Social class differences as hypothesised by Jensen will appear (if at all) 
in the form of middle class superiority on recall of the categorised lists in the absence of 
superiority on the uncategorised lists. 


EXPERIMENT 1 


The first experiment was conducted by Shirley Paterson in schools in Inverness. Middle 
and lower class children were taken from the same classes in the same schools on the basis of 
father's occupation. 


Sample 
Ninety-six children were employed divided into eight groups of twelve, who were male 
or female, of middle or lower social class and aged 11 or 15. 
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Materials 

Categorised and uncategorised lists were presented alternately. The categorised lists 
were, respectively, names of trees (spruce, pine, elm . . .), states of the weather (wind, mist, 
rain .. .) and birds (crow, jay, wren . . .). The uncategorised lists consisted of unrelated 
words (e.g. Ann, cheese, blue. . .). 


Procedure 
Subjects were tested in class and wrote down their responses in vertical columns which 
allowed twelve spaces for each list. Instructions were as follows: 
** What you have to do is write down as many words as possible from a list that I will 
read out. There will be six lists altogether. Before I read out a list I will say * Ready ’. 
At the end of the list I will say * Recall'. This is your signal to write down as many of 
the words as you can remember in any order. The number of spaces on your answer 
sheet is not necessarily the same as the number of words 1 shall read out so don't write 
any word for the sake of filling up a space." 


Results 

Results are shown in Table 1 in terms of mean words recalled out of eight. They were 
subjected to four-way analysis of variance: 2x2x2 between groups x2 (categorised vs. 
uncategorised lists) within groups. Of the main effects, sex (male 5-09, female 5-29) was not 


TABLE 1 
MEAN WoRDs RECALLED OUT OF EIGHT (EXPERIMENT 1) 























Male Female 
Middle Lower Middle Lower 
Class Class Class Class 
Categorised 11 years 5:86 442 5:19 492 
15 years 6:06 5:56 6:19 6:08 
Uncategorised 11 years 4:42 4:00 4:67 4:36 
15 уеагв 5:33 5:08 5:53 5:36 





(N in cell 212) 


significant. Middle class children (5-41) were significantly superior to lower class children 
(4-97) (Е —7.78; df=1,88; Р<0-01). Fifteen-year-olds (5-65) were significantly superior to 
11-year-olds (4-73) (F 235; df=1,88; P<0-001). Recall of categorised material (5:53) was 
significantly superior to recall of uncategorised (4-84) (Е —66; df=1,88; P<0-001). The 
main point of interest is, however, the interaction between social class and categorised/ 
. uncategorised material. This did not reach significance (Е =3-09; df—1,88; NS) but the 

triple interaction with sex was significant (F =4-01; df—1,88; P<0-05). No other interaction 
reached significance. The significant triple interaction suggested that the analysis might 
usefully be repeated separately for the two sexes and this was done. The main effects of age 
and categorised/uncategorised material remained significant at the same level for each sex but 
the social class effect was no longer significant (F1) in the female group and no interaction 
reached significance. In the male group, middle class subjects were significantly superior to 
lower class (F=8-61; df=1,44; P<0-01) and the interaction between social class and 
categorised/uncategorised material was significant (F—8-23; df=1,44; P<0-01), middle 
class subjects being superior on categorised material but not on uncategorised. No other 
interaction was significant. 


Discussion 

Jensen's hypothesis receives partial support—in older children and with less marked 
social class differences, but only in the male group not in the female group. : If, however, it 
were the case that Jensen's hypothesis concerned only a transitory developmental difference 
in which the middle class children he tested were superior to the lower class because they 
were progressing at a faster rate towards a similar final level of ability, this would be consistent 
with our results, bearing in mind the widely accepted view that females reach their peak of 
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verbal ability earlier than males, In our results the effect is present in males but has already 
disappeared in females because of their relatively greater verbal maturity. If this were the 
case, a social class difference might be expected in younger children in females, but not in 
males. 


EXPERIMENT 2 


The second experiment was conducted in Aberdeen schools by Clare Dallas-Ross. 
Middle and lower class children were drawn from different schools but their class affiliation 
was checked by father’s occupation. 


Sample 
Eighty children were employed divided into eight groups of ten, who were male or 
female, of middle or lower social class and aged 6 or 8. 


Materials 

Categorised and uncategorised lists were presented alternately. Categorised lists were, 
respectively, names of domestic animals (cow, horse, pig . . .), colours (red, black, green . . .) 
and states of the weather (snow, sleet, fog . . .). Uncategorised lists consisted of unrelated 
words (e.g. owl, nose, bread . . .) 


Procedure 
Subjects were tested individually and responses were recorded by the experimenter. The 
instructions were as follows: 
“Тат going to read you some lists of words. When I have finished reading each list, I 
want you to say back to me all the words you can remember in any order." 


Results 

Results are shown in Table 2. They were subjected to the same four-way analysis of 
variance. Of the main effects, sex (male 3-93, female 3-98) was not significant. Middle class 
children (4-08) were again significantly superior to lower class (3:71) (Е--6:81; df=1,72; 


TABLE 2 
MEAN WoRDs RECALLED OUT oF Eicut (Experiment 2) 











Male Female 

Middle Lower Middle Lower 

Class Class Class Class 

Categorised 6 years 4:43 3:70 4°87 4:33 
8 years 5:27 5:20 5:13 517 

Uncategorised 6 years 2:33 2:07 3:07 227 
8 years 3:80 3:63 3°70 3°33 

(N in cell = 10) 


P«0-05) Eight-year-olds (4-40) were significantly superior to 6-year-olds (3:38) (Е=41; 
df=1,72; P« 0-001). Recall of categorised material (4-77) was significantly superior to recall 
of uncategorised (3-03) (Е 2310; df=1,72; P<0-001). The only significant interaction was 
that between sex, age and categorised/uncategorised material (F —5:44; df=1,72; P<0-05). 
This arose because 6-year-old males scored relatively worse than both 6-year-old females and 
8-year-old males on uncategorised material, but not on categorised material. The analysis 
was repeated separately for the two sexes as before. The main effects of age and categorised/ 
uncategorised material remained significant for each sex. The social class effect was now 
significant for females (Е =5-80; df=1,36; P<0-05), but not for males (F=2:02; df = 1,36; 
NS). No interaction reached significance ‘in either sex ‘group. 


Discussion 
In younger children it appears that middle class females are superior to lower class but 
(in the absence of a significant interaction) their superiority is on both categorised and un- 
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categorised material. No such difference exists in males. The significant triple interaction 
appears to result almost wholly from a difference between males and females since the two- 
way interaction between age and categorised /uncategorised material does not approach 
significance in either sex group. The subjects in this experiment were similar in age to those 
of Jensen and Frederiksen (1973). Since sex differences loom so large it may be unfortunate 
that they took no account of sex—not even to the extent of reporting how many of their 
subjects were male and how many female. 


GENERAL DISCUSSION 


The results offer general though obviously not conclusive support for the hypothesis that 
the superiority of middle class children over lower class children on recall tasks requiring 
manipulation of categorisable material may be a relatively transitory phenomenon arising 
from two sources—firstly, from the greater emphasis on verbal interaction in middle class as 
against lower class families which brings children to their peak in this respect earlier but may 
not necessarily bring them to a higher peak and, secondly, from the fact that female children, 
irrespective of social class, reach their peak (not necessarily a higher peak) more quickly than 
males and will consequently respond to ‘ middle class * pressure at an earlier age. 


Though our results show superior recall of categorised as against uncategorised material 
in all the age groups tested (and the superiority remains in adults—Wetherick, 1975), the 
increase in recall with age is much sharper for uncategorised material than for categorised. 
For categorised material the increase is from 4-33 at age 6 to 5-97 at age 15. For uncategorised 
material, it is from 2-44 at age 6 to 5:33 at age 15. The categorised/uncategorised difference 
declines from 1-89 at age 6 to 1-57 at age 8, 0-74 at age 11 and 0-64 at age 15. It may be 
argued that this constitutes further evidence in support of our hypothesis. Language 
development necessarily involves practice in the (implicit or explicit) manipulation of 
categorisable material whereas the manipulation of uncategorisable material receives no such 
practice. Ultimately the difference in favour of categorisable material is small; the fact that 
it is larger at earlier developmental stages may be a consequence of the fact that the one 
ability is highly practised (and susceptible to different degrees of practice) and the other is not. 


How far all this bears on the proposed distinction between Level I and Level II ability 
is unclear. Level П ability must involve more than the mere ability to recall categorisable 
lists since it is also tapped by problem-solving tasks and intelligence tests. Jensen may 
simply be wrong in supposing that it is adequately tapped by the kind of task used here and 
by Jensen and Frederiksen or it may be that it is adequately tapped by their task but not by 
ours—although it is not easy to see how this could be the case. The suggestion that the 
phenomenon may be transitory receives further support from the fact that the ability appears 
to be educable. One Aberdeen primary school known to us (drawing its pupils from a lower 
class housing area) makes a fetish of categorisation to the point, for example, of allowing the 
children to use only one colour of paint on one day and hanging the paintings in groups 
according to colour! Children from this school have, in two unpublished studies, achieved 
better recall scores than middle class children on categorised material but scored substantially 
worse on uncategorised. There is no evidence that they derive any long-term benefit.—from 
our argument it would follow that they may simply be enabled to reach their peak (not 
necessarily a higher peak) more quickly. 
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EFFECTS OF DIFFERENTIAL FEEDBACK ON THE ANSWERING OF 
TWO TYPES OF QUESTIONS BY FIFTH- AND SIXTH-GRADERS 


By J. PEECK 
(Psychological Laboratory, University of Utrecht, The Netherlands) 


Summary. 52 fifth- and sixth-grade children read a 900-word text which was followed by a 
multiple-choice test consisting of fact and inference questions. One and a half hours later, they 
were given feedback either with or without the original text present. Four days later all children 
were retested. On both the immediate and delayed test sixth-graders performed better than 
fifth-graders, while, especially in grade 5, more fact than inference questions were answered 
correctly. These results were interpreted as corroborating recent findings on developmental 
improvement in constructive cognition. Feedback on inference questions seemed somewhat 
more effective with the original text than without. The children correctly identified almost 90 
per cent of their first-test responses at the time of the delayed test. The identification patterns 
were interpreted as providing evidence against the interference-perseveration explanation of the 
delay-retention effect. 


INTRODUCTION 


The last decade has seen an increase in studies investigating the effects of feedback on 
meaningful verbal learning and retention. Much of recent experimentation in this area has 
been concerned with the Delay-Retention Effect (DRE). This label refers to the finding that 
subjects receiving immediate knowledge of the correct responses on a multiple-choice test 
retain Jess than subjects for whom feedback is presented after a period of delay. 


Of special interest for educational practice are the results obtained with school-age 
children using reading materials in normal classroom settings. Here, too, some investigators 
(e.g. More, 1969; Sassenrath, 1975; Surber and Anderson, 1975) found that delayed feedback 
was superior to immediate feedback, though others (e.g. Kippel, 1974; Newman et al., 1974; 
Phye and Baller, 1970) failed to obtain this effect. 


In contrast to the interest in the feedback interval, relatively little attention has been paid 
to other factors affecting the effects of feedback in classroom settings. Thus little is known 
about the effects of different forms of feedback. Inspection of the literature shows that 
various procedures have been applied to provide students with informative feedback, but that 
little systematic study of the specific effects of these procedures has taken place. One example 
of this kind of study is the experiment by Sassenrath and Garverick (1965) in which three 
forms of feedback were compared: looking up wrong answers in the textbook, having 
answers discussed by the instructor, and checking over answers from correct ones on the 
board. Subjects in all three conditions obtained significantly higher scores on a retention 
test than a control group who had not been given feedback, the discussion method being also 
superior to the group who looked up answers in the textbook. Curiously enough, the dis- 
cussion group also did significantly better than the no-feedback group on a transfer test 
consisting of new items that had not appeared in earlier testing. 


In the present study there were two forms of feedback. In one condition, subjects were 
given feedback sheets identical to the immediate test sheets, with the correct alternatives 
circled. In the other condition subjects were given both the original text and the feedback 
sheets with correct alternatives circled. For easy reference, the lines of the text were num- 
bered; each test item on the feedback sheet was accompanied by the number of the line(s) 
in the original text relevant for that particular question. 


Since the effectiveness of different forms of feedback could very well depend on the kind 
of test question used, two types of multiple-choice questions were used in the present 
experiment, namely fact questions and inference questions. As the relationship between the 
correct answer and the relevant information in the text on which this answer was based was 
probably not immediately obvious for the inference questions, it was predicted that access to 
the text would be particularly helpful when dealing with the latter type of questions. 


Also, little is known of possible differences in the ability to profit from feedback between 
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children from different age groups. In the present study, therefore, both fifth- and sixth- 
graders participated. All children obtained feedback about one and a half hours after taking 
the initial test; on the basis of DRE literature, this interval could be considered a reasonably 
favourable one for obtaining effects from feedback. Four days later the retest was given. 


Though the present study was not directly concerned with the DRE, some of the data 
collected were of relevance to current theorising on this phenomenon. At present the most 
widely accepted theory of the DRE is the interference perseveration hypothesis, put forward 
by Kulhavy and Anderson (1972). According to this hypothesis delayed feedback is more 
effective because “ learners forget their incorrect responses over the delay interval, and thus 
there is less interference with learning the correct answers from the feedback " (p. 506). A 
recent study by the present author (Peeck and Tillema, in press), however, showed that sub- 
jects tend to xemember their initial responses for a considerable length of time, with, appar- 
ently, little adverse effect on their ability to profit from feedback. In order to gather more 
data on this issue, all subjects were asked to identify, after they had completed the retest, the 
alternatives they selected on initial testing. 


METHOD 


Materials 

The learning material was a 900-word text, originally designed as part of a silent reading 
test. The text, “ Avontuur bij nacht " (Adventure by Night), told the story of two boys at a 
camping site. The lines of the story were numbered 1-78. For this text two sets of 12 
multiple-choice questions (4 alternatives) were constructed. One set (fact questions) required 
the recall of specific factual information. The other set (inference questions) required subjects 
to make an inference from the literal message contained in the prose. For instance, in the 
text it said that only youngsters of 15 years and older were allowed to take part in a certain 
game; thecorresponding inference question gave the names and ages of four pairs of children, 
and asked which pair could participate; the correct alternative was the only pair where both 
children were older than 15. The fact and inference questions were combined in random 
order to form the immediate test. The test questions were re-ordered for the delayed 
retention test. There were two forms of feedback sheets. One form (A) was identical to the 
form used for the immediate test, except that the correct alternatives were circled. The other 
(B) gave, in addition, the relevant Jine-number(s) of the text. 


Sample and design 

The sample consisted of children in the fifth and sixth grade of an elementary school. 
The mean age of the fifth-grade children was 10:9 years (SD =3-1 months), the mean age of 
the children in grade six was 11:9 years (SD —4-2 months). The children in each class were 
randomly assigned to one of two conditions, with the restriction that all groups contained 
about the same proportion of boys and girls. One condition was given feedback form A, the 
other condition feedback form B and the original text. In order to get equal cell frequencies 
some of the original 32 children in the sixth grade and one of the original 27 children in the 
fifth grade were randomly removed from the sample. The remaining 52 children thus 
participated in a 2 (grade) x 2 (form of feedback) x 2 (type of question) design. The results 
were analysed in 2 x 2 x 2 analyses of variance, with repeated measures on one factor. 


Procedure 

For the first part of the experiment, the groups were tested in their regular classroom. 
At the end of the morning of the first day, the children were given the text and instructed to 
read it carefully in anticipation of a test that would follow the reading. After they had 
studied the text for 15 minutes and the booklets had been collected, the subjects were given 
the initial test sheets. The multiple-choice principle was explained to the children and they 
were asked to answer all questions, and to guess if they did not know the answer. 


For the feedback part of the experiment, in the early afternoon of the same day, the 
conditions were treated in different rooms. One group in each grade was given feedback 
form A and told to look over the questions and correct alternatives carefully. The other 
group was presented with feedback form B and the original text. They were given identical 
instruction and in addition were told that with each question on the feedback sheet, the 
number of the line(s) in the original text relevant for the correct answer was given; they were 
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encouraged to use these references when going over the items. For the feedback stage all 
subjects were given 10 minutes. 


Four days later all subjects took the delayed retention test. After all children had 
completed this task, they were asked to go through the text again and to indicate by under- 
lining which alternative they had chosen on initial testing. 

RESULTS 

Table 1 presents the means and standard deviations for fact and inference questions for 
both conditions in grade 5 and 6. 

TABLE 1 


MEANS AND STANDARD DEVIATIONS FOR FACT AND INFERENCE QUESTIONS ON IMMEDIATE 
AND DELAYED TEST AS A FUNCTION OF GRADE AND FEEDBACK. CONDITION 








Immediate test Delayed test 
Fact Inference Fact Inference 
Group Condition M (SD) M (SD) M (SD) M (SD) 
Feedback 
Grade without text 869 (1°79) 7:31 (1:84) 1061 (126) 969 (1:70) 
5 Feedback 
with text 877 (173) 731 (046 1046 (133) 1038 (1:55) 
Feedback 
Grade without text 9°38 (1:50) 9:46 (145) 1:53 (077) 10°92 (1°32) 
6 Feedback 
with text 9:54 (1:45) 9-15 (146) 123 (10) 1123 (092 


Analysis of variance of the scores on the immediate test showed significant main effects 
for grade (Е —11-04; df=1,48; P<0-01), and type of question (Е =9:56; df —1,48; P<0-01), 
and also a significant grade x type of question interaction (Е =6-20; df=1,48; P<0-01). As 
Table 1 indicates the sixth-grade pupils obtained higher scores than the fifth-grade children, 
while generally more fact than inference questions were answered correctly. However, 
further analysis of the interaction between grade and type of question showed that the 
differences between fact and inference questions only reached significance in grade 5 
(P<0-01). Also, the difference between the grades was much more substantial for inference 
questions (P« 0-01) than for fact questions (Р<20:05). As Table 1 shows, there was very 
little difference in scores between the two feedback conditions within each grade (F —0-02). 


Analysis of variance of the delayed-test results again showed a considerable difference 
between the grades (Е =10-22; df=1,48; P<0-01), and a decreased, but still significant 
difference between the fact and inference questions (F =4-38; df=1,48; P« 0-05). The only 
substantial interaction effect was between form of feedback and type of question (Е —3-59; 
df=1,48; 0-05—P«0-10). Table 1 shows that, as expected, in both grades somewhat more 


TABLE 2 


MEAN SCORES ON CONDITIONAL PROBABILITY MEASURES R2/W1 AND R2/R1 For 
FACT AND INFERENCE QUESTIONS AS A FUNCTION OF GRADE AND FEEDBACK 











CONDITION 
R2/W1 R2/R1 
Group Condition Fact Inference Fact Inference 
Grade Feedback without text "71 "68 95 94 
5 Feedback with text "68 “72 796 96 
Grade Feedback without text 92 "76 98 '97 


6 Feedback with text "84 "86 497 98 
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inference questions were answered correctly when subjects could refer to the original text at 
the time of feedback. Surprisingly, the inverse relationship seemed to hold for the fact 
questions: in both grades the children tended to be more successful in answering fact 
questions on the delayed test when the text had been absent at the time of feedback. 


In addition to the analyses reported above, the data of the experiment were also scored 
and analysed according to the conditional probability measures suggested by Surber and 
Anderson (1975): the conditional probability of being right on the delayed test, given a 
correct response on the first test (R2/R1), and the probability of being right on the delayed 
test, given an incorrect response on the immediate test (R2/W1). Table 2 presents the mean 
scores for these measures. Analysis of variance of the arcs in transformations of the R2/W1 
scores yielded a significant main effect for grade (F —4-64; df=1,48; P<0-05); as Table 2 
indicates, subjects in grade 6 were generally more successful in correcting initial errors than 
subjects in grade 5. There were no significant differences between types of question or forms 
of feedback (in either case: F<1). The interaction between grade and type of question, 
however, was found to be significant (Е 24-59; d£—1,48; P« 0-05); further analysis indicated 
that only on the fact questions did a significant difference between the grades occur (P « 0-01). 
The type of question x form of feedback interaction was again nearly significant (F =3-76; 
df=1,48; 0-05 —P-— 0-10), with the pattern of scores, as Table 2 shows, similar to the one 
presented in Table 1. For the other conditional probability measure, R2/R1, no differences 
were found, as can be seen in Table 2; subjects were generally very successful in repeating 
initially correct answers on the delayed test. 


TABLE 3 


MEANS AND STANDARD DEVIATIONS FOR FirsT-TEST RESPONSES CORRECTLY 
IDENTIFIED ON THE DELAYED TEST AS A FUNCTION OF GRADE AND FEEDBACK 








CONDITION 
Fact Inference 
Group Condition M (SD) M (SD) 
Grade Feedback without text 10:69 (1:43) 9:54 (210 
5 Feedback with text 10°61 (1:38) 10:15 2:11) 
Grade Feedback without text 11°15 (0:98) 10:92 (1:25) 
6 Feedback with text 11:23 (1.09) 10°61 (1°38) 


Analysis of variance of the identification data, presented in Table 3, yielded significant 
main effects for grade (Е =4-55; df=1,48; P<0-05) and type of question (Е 26-19; df=1,48; 
P<0-05). Sixth-graders thus more often identified their first test responses at the time of the 
delayed test than fifth-graders, and more fact than inference questions were identified. There 
was no significant interaction. 


For an understanding of the effects of feedback it is important to know whether a 
different proportion of previous responses is identified when initially incorrect responses are 
corrected (W1R2) than when, on second testing, the same incorrect response (W1W2s) or a 
different incorrect response (W1W2d) is given. Similarly, it seems worthwhile to find out 
whether differences in identification occur when initially correct responses are repeated on 
second testing (R1R2), in comparison to initially correct responses turned into incorrect 
choices on the delayed test (R1W2). The result of an analysis of this kind is shown in Table 4. 
It presents for fact and inference questions combined the number of cases in each category 
with the percentage of responses identified. Comparison of the RIR2 and R1W2 data shows 
a very high identification score for responses correct on both tests, and a much lower score in 
the RIW2 category. Thus, when children had forgotten which * correct’ answer they gave 
on the first test, they were much more likely to give an incorrect response on second testing 
than when they had remembered their correct first test response. 


Of special relevance for understanding the mechanisms of feedback are the W1R2 and 
W1W2d data. They show that when initially incorrect responses are corrected on the delayed 
test, subjects are generally very successful in identifying their previous responses. However, 
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TABLE 4 


NUMBER OF CASES AND PERCENTAGES OF INITIAL RESPONSES IDENTIFIED BY FIFTH- AND 
SixTH-GRADERS FOR FIVE RESPONSE PATTERNS (see text) 








Grade 5 Grade 6 

First-test First-test 

Response Number of answers Number of answers 

Pattern cases identified cases identified 
о 9 
RIR2 398 95 475 97 
RIW2 19 42 13 15 
WIR2 137 75 109 85 
W1W2s 42 71 14 64 
WIW2d 28 54 13 46 





when subjects change their initially incorrect responses into different incorrect choices, they 
are considerably less able to identify their initial choice. Taken together, these results do not 
suggest that forgetting first test responses is a necessary or even desirable condition for 
profiting from feedback; if anything, availability of first test responses seems useful rather 
than detrimental. 


DISCUSSION 


'The results of this experiment show that both before and after feedback, sixth-graders 
were more capable of answering test questions on the reading material than fifth-graders. 
On the initial test, the inference questions in particular were answered much better by the 
older children. Thus, though their factual retention of the text content was also somewhat 
inferior, the younger children were especially defective in applying their knowledge in making 
the inferences required. On the delayed retention test following feedback, the difference in 
performance on the inference questions was considerably reduced, though still significantly 
in the favour of the sixth-graders. 


These results corroborate recent findings on age-related improvement in inferential 
processing in comprehension and memory (for a review, see Paris and Lindauer, 1977). Paris 
and Upton (1976), for instance, read stories to children in kindergarten through fifth grade 
and found that, while older children correctly answered verbatim questions more often than 
younger children, questions concerning inferred presuppositions and consequences of the 
stories showed the greatest age-related improvement. Also, a recent study by Paris and 
Lindauer (1976) showed a developmental improvement in the ability to use implicit and 
explicit word prompts as retrieval cues for sentence memory. 


Even though the younger children were found to have profited substantially from 
feedback, the more sensitive conditional probability measure R2/W1 indicated that the sixth- 
graders were generally more capable of learning the correct answers from the feedback. In 
accordance with the results on the delayed test, the R2/W1 data show that the superior 
ability of the older children to profit from feedback was especially manifest for the category 
of fact questions. The results for the other conditional probability measure, R2/R1, demon- 
strate a very strong tendency in both age groups to repeat initially correct responses on the 
delayed retention test. This tendency appeared unaffected by form of feedback, or type of 
question. 


The results of this study give some support to the suggestion that access to the original 
text at the time of feedback should be beneficial for dealing with inference questions, though 
the interaction was not as strong as expected. Not expected, however, was the inverse 
relationship which seemed to hold for the category of the fact questions: performance on 
these questions seemed to benefit, to some extent, from the absence of the text at the time 
of feedback. The cause of this effect, too, is possibly to be found in the condition where 
subjects were allowed to refer to the text at the time of feedback. The presence of the text 
made these children perhaps pay undue attention to the inference questions, at the cost of 
the fact questions. 
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The overall level of identification was very high: 88 per cent of the initial responses were 
correctly identified by the children at the delayed retention test. This replicates the results 
obtained in an earlier study by the present author (Peeck and Tillema, i in press), and con- 
tradicts a basic assumption of the interference-perseveration hypothesis ‘ that, following a 
delay, subjects forget the responses to which they committed themselves on the initial test 
(Kulhavy and Anderson, 1972, p. 511). In fact, the analysis of the identification patterns 
suggested that the continuing availability of an initially incorrect response could very well 
facilitate rather than inhibit learning a correct response at the time of feedback. 


Finally, it should be noted that the sixth-graders identified more initial responses than 
the fifth-graders, and that identification was significantly higher for fact questions than for 
inference questions. These results could perhaps be taken as supporting the interpretation 
by Peeck and Tillema that the identification data reflect depth of processing (cf. Craik and 
Lockhart, 1972) of test items at the time of the first test. Due to the sometimes obscure 
relationship between inference questions and the text, responses to these questions may 
occasionally have been the result of guessing rather than retrieving and applying relevant 
information from the text. It is assumed that guessing implies a relatively superficial level 
of processing with, as a result, relatively poor retention. This interpretation is 1n accordance 
with the outcome of the immediate test, which showed that fifth-graders had considerable 
difficulty in answering inference questions. 
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HYPOTHETICALITY AND THE OTHER OBSERVER IN A 
PERSPECTIVE TASK 


By L. A. FEHR 
(University of New Orleans) 


Summary. An analysis of the ramifications of using a doll as the other observer in a co- 
ordination of perspectives task is presented. The use of a doll was questioned due to its inanimacy 
as well as its hypotheticality as an observer. Children aged between 6 and 12 years were tested 
using a normal observer, a blindfolded observer, and a doll as the other observer. The lowest 
overall, as well as egocentric error, rates were noted in the normal observer condition with the 
remaining two conditions yielding similar trends. It was concluded that performance in a 
perspective task is diminished when the other observer is a hypothetical one. 


INTRODUCTION 


А co-ordination of perspectives task requires a subject to predict the viewpoint of another 
observer. In their research on this topic, Piaget and Inhelder (1956) asked children to select 
which of a set of pictures corresponded to what a doll could see from given positions on the 
perimeter of a three-mountain scene. Based on this perspective-taking research, Piaget and 
Inhelder established the following developmental stages: 

(1) At stage 1, which can last until 5:6 years to 6 years, the child is unable to comprehend 
the task at hand. 

(2) At stage 2-A, which can last until 7 years, the child responds egocentrically, believing 
that his is the only possible point of view. 

(3) At stage 2-B, which can last until 8 years, the child attempts to distinguish between 
different viewpoints, but is rarely successful. 

(4) At stage 3-A, which can last until 9 years, the child realises that horizontal and 
vertical relationships vary according to the position of the doll. However, correct responses 
are still made on an occasional basis. 

(5) At stage 3-B, which is completed in the tenth year, the child uses all relevant relation- 
ships to arrive at correct responses. 


There appears to have been a problem involving the methodology employed by Piaget 
and Inhelder (1956) in that children were asked to find the snapshots that a doll could have 
taken from given positions when it is questionable to state that a doll either could take a 
snapshot or has a perspective. Following the lead of Piaget, many studies (Dodwell, 1963; 
Laurendeau and Pinard, 1970) continued to use a doll as the other observer. Some years ago 
the potential importance of this methodological issue was mentioned by Fishbein et al. (1972) 
who used a human experimenter as the other observer. They pointed out that asking children 
to conceive of the viewpoint of a doll is a hypothetical problem because a doll can neither see 
nor take a photograph. An initial empirical test of this issue was performed by Cox (1975) 
who found that egocentric and overall error rates were higher for 7-year-old subjects when 
a doll rather than a person served as the other observer. Further research by Cox (1977) 
indicated that performances with a person as the other observer were superior to those with 
a doll or a self-projected (absent) person as the other observer. These results have led to the 
conclusion that the hypotheticality of the other observer may be a crucial issue in evaluating 
children’s ability to co-ordinate perspectives. 


The research of Cox (1975, 1977) poses three unanswered questions: 

(1) Why is it more difficult to co-ordinate perspectives with a doll rather than a person? 

(2) Are there developmental trends to be found relative to the ability of subjects to 
co-ordinate perspectives with a doll and a person? 

(3) Does the ability of children to co-ordinate perspectives with a variety of other 
observers vary according to the nature of the distortions that are depicted by the incorrect 
choice stimuli? 

In the present study, these questions were tackled by assessing the ability of first, third, 
and sixth grade subjects to co-ordinate perspectives when a normal observer, a blindfolded 
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observer, and a doll served as the other observer. Relative to Question 1 above, if the 
performance of subjects in the blindfolded observer and doll conditions are similar, it could 
then be stated that a crucial factor in co-ordinating perspectives is the ability of the other 
observer to perceive the spatial stimuli. That is, both a doll and a blindfolded observer are 
hypothetical observers. However, if performances in the blindfolded observer condition are 
found to be superior to those in the doll condition, it could then be stated that the critical 
aspect of a doll as an observer is its inanimacy rather than its hypotheticality. Related to 
Question 2 above, the effect of variations in the nature of the other observer may be most 
strongly realised in the lower grades where it is clear that formal operational development has 
not yet begun. Finally, relative to Question 3, it is expected that a difficult task should 
accentuate the effects of variations in the nature of the other observer. However, if a task is 
totally beyond the capabilities of the subject, the importance of variations in the other 
observer may decrease. 


METHOD 


Sample 

The sample comprised 72 lower-middle class children. There were 14 girls and 10 
boys in grade 1 (mean age —6:6, SD =0-30), 14 girls and 10 boys in grade 3 (mean age =8:6, 
SD =0:35), and 14 girls and 10 boys in grade 6 (mean age=11:6, SD —0-48). In addition 
to differing in age, the subjects were divided according to the nature of the choice stimuli to 
which they were exposed (rotations, non-rotations). This yielded to a total of six experi- 
mental groups with each being composed of seven girls and five boys. 


Stimuli 

Children’s ability to co-ordinate perspectives was examined using a modification of the 
task described by Fehr and Fishbein (1976). The stimuli consisted of three small blocks 
(sphere, cube, and pyramid) located near the centre of a white circular board (diameter, 
30 cm). The board was located on a table which measured 75-5 cm in height. The sizes of 
the three objects were as follows: sphere (height, 5-0 cm; diameter, 3-3 cm), cube (height, 
5-0 cm; base, 3:8 cm), and pyramid (height, 5-0 cm; height at base, 2:2 cm). On each trial, 
the three objects were depicted in one of seven right triangular arrangements. "That is, two 
of the three objects possessed the same x co-ordinate and different y co-ordinates, while two 
of the three objects possessed the same y co-ordinate and different x co-ordinates. In other 
words, on each trial, two and only two of the three objects were located on the same hori- 
zontal plane and two and only two objects were located on the same vertical plane. 


The choice stimuli consisted of two dimensional depictions of the objects and board 
which were drawn to half scale in relation to the actual objects and board. Six choice stimuli 
were used on each trial. These stimuli were presented to the subjects as an array of 3 rows 
by 2 columns on a piece of grey paper which measured 60 cm in length and 45 cm in width. 
One group of subjects in each age group was exposed to a non-rotation condition in which the 
choice stimuli included a correct choice, an egocentric choice (the view of the subject), two 
choices depicting horizontal errors (varying the positions of the two objects that possessed 
the same x co-ordinate in the object array), and two choices depicting vertical errors (varying 
the positions of the two objects that possessed the same y co-ordinate in the object array). 
The remaining subjects were exposed to a rotation condition in which the choice stimuli 
consisted of an egocentric choice as well as five rotated views of the array (60°, 120°, 180°, 
240°, and 300°). On each trial, one of these views was a correct choice. 


Procedure 

Each subject was brought into the testing room and seated in front of the table. Cushions 
of various sizes were provided for the smaller children in order to ensure that their viewing 
position was not different from that of ће larger children. Each child was first given a pretest 
in which frontal or own view trials were performed in order to determine if the ability to 
comprehend the nature of a spatial task was present. Children who failed the pretest were 
removed from further consideration. Following the pretest, a series of demonstration trials 
were performed in order to acquaint the child with the nature of a co-ordination of per- 
spectives task. Onthe test trials, each subject was exposed to 30 co-ordination of perspectives 
and 6 frontal trials. The trials were equally divided among the three types of other observer 
(normal observer, blindfolded observer, and doll). In performing each trial, the subject was 
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asked, “ Which of these pictures looks to you the way the blocks look to me?" The precise 
wording of this question varied as a function of the nature of the other observer. After each 
trial, the arrangement of the objects, the choice stimuli, and the orientation of the other 
observer were changed. In order to control for the height of the other observer, the experi- 
menter sat in a chair that placed her eyes at a height identical to those of the doll when the 
doll was placed in position for the test trials (standing on the table). 


RESULTS AND DISCUSSION 


, Table 1 presents the percentage of responses that were correct. An analysis of variance 
indicated that percentage correct increased with age (Е —30-49; df—2,66; P<0-01). This 
effect is focused on differences occurring after the third but before the sixth grades. This is, 











TABLE 1 
MEAN PERCENTAGE OF RESPONSES THAT WERE CORRECT 
Mean Age Other Observer Condition 

(yrs.) Non-Rotation Rotation 

6:6 Normal 100 100 
Blind 67 8-3 
Doll 67 8:3 

8:6 Normal 10:0 50 
Blind TS 50 
Doll TS 50 

11:6 Normal 64:2 267 
Blind 39:2 25:0 
Doll 40:0 24-2 


supported by the significant differences that were found (using Scheffé tests) between the first 
and sixth grade percentages (і--6:22, df=2,66; Р<0:01) and the third and sixth grade 
percentages (t —7-19, df=2,66; P<0- 01) while significance was not found in comparing the 
first and third grade data. This supports the notion of Nigl and Fishbein (1974) that there is 
a qualitative shift in perspective ability between the ages of 9 and 11. 


Percentages correct was also found to vary as a function of the other observer (F —4-03; 
df =2,132; P« 0-025). It can be noted in Table 1 that this difference can be attributed to 
superior performances in the normal observer condition. This claim is supported by 
significant Scheffé comparisons between the normal and blindfolded observer conditions 
(t—2:68; df=2,132; P<0-05) as well as between the normal observer and doll conditions 
(t—2-59; df=2,132; P<0-05). Significance was not found in comparing the blindfolded 
observer and doll conditions. The similar percentages in the doll and blindfolded observer 
conditions offer initial support for the potential importance of the hypotheticality rather than 
the inanimacy of a doll as an observer. 


Significant differences in percentage correct were also noted as a function of stimulus 
type (Е =5-49; df=1,66; P<0-025) with higher percentages being noted in the non-rotation 
than rotation condition. This effect was limited to the sixth grade subjects resulting in a 
significant grade x type of choice stimuli interaction (F=3-68; df=2,66; P<0-05). It 
appears that only the sixth-graders were able to make adequate use of horizontal and vertical 
errors as a means of eliminating the non-egocentric incorrect choices. 


Table 2 presents the percentage of responses that were egocentric. An analysis of 
variance based on these percentages showed a significant effect relative to grade (F =20-94; 
df=2,66; P<0-01). As was the case relative to percentage correct, this effect was focused on 
differences between the first and third grades on one hand and the sixth grade on the other. 
This claim was supported by Scheffé comparisons. More specifically, this effect was largely 
a factor of the extremely low egocentrism percentages that were noted for the sixth grade 
subjects in the non-rotation condition. This claim was supported by a significant grade x 
choice stimuli interaction (Е =3-68; df=2,66; P<0-05). Finally, significance was noted as 
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TABLE 2 
MEAN PERCENTAGE OF RESPONSES THAT WERE EGOCENTRIC 














Mean Age Other Observer Condition 

(yrs.) Non-Rotation Rotation 

6:6 Normal 75:0 73:3 
Blind 83:3 80:0 
Doll 85:0 83:3 

8:6 Normal 69:2 72:5 
Blind 750 742 
Doll 758 758 

11:6 Normal 15:8 53:3 
Blind 33:3 60-0 
Doll 33:3 53:3 


a function of the other observer (F =4`52; 4Ё=2,132; P<0-025). As was the case for per- 
centage correct, this effect can be attributed to superior performances (low egocentrism 
percentages) in the normal observer condition. This is supported by significant Scheffé 
comparisons between the normal and blindfolded observer conditions (t=2-55; df=2,132; 
P«0-05) as well as between the normal observer and doll conditions (t=2-98; df=2,132; 
P<0-05). Significance was not found in comparing the blindfolded observer and doll 
conditions. This clearly supports the importance of the hypotheticality of the other observer 
in a co-ordination of perspectives task. 


The findings of this study indicate that presenting subjects with an extremely difficult 
perspective task results in thought processes that are largely egocentric in nature. When 
subjects did not make egocentric errors, the most likely errors were adjacent errors (60° from 
the correct choice) in the rotation condition and horizontal errors in the non-rotation 
condition. The conclusion that must be reached, based on the results of this study, is that 
when subjects are unable to make viable spatial judgments, they seek the comfort of ego- 
centrism. This is especially true when the other observer is a hypothetical one. 
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BOOK REVIEWS 


ANDERSON, В. C., Spmo, В. J., and MoNTAGUE, W. E. (Eds.) (1977). Schooling and 
the Acquisition of Knowledge. Hillsdale, New Jersey: Lawrence Erlbaum, 
pp. x + 448, £14-00. 


The very rapid development of research into cognitive psychology has left many educa- 
tional psychologists somewhat out of touch with the most recent thinking in this area. 
This book reports the papers and discussions at a conference of psychologists, educators 
and philosophers held in San Diego during November 1975, While cognitive psychology 
has moved on from then, these papers are important for educational psychologists because 
they address distinctively educational questions. How is knowledge organised? How does 
knowledge develop? How is knowledge retrieved and used? What instructional techniques 
promise to facilitate the acquisition of new knowledge? 


The answers reflect, on the whole, the interests of cognitive psychologists and the 
chapters read like a Who's Who of that research area— Olson, Rothkopf, Rumelhart and 
Ortony, Meyer, Nelson, Resnick, Gagné and Anderson. But the book opens with char- 
acteristic philosophic caution. Brondy questions the whole thrust of recent psychological 
research. He argues that it has paid too much attention to the more overt forms of knowing, 
whereas it is tacit knowledge which children carry out from their schooling, much more 
than facts or skills (knowing what and knowing how). Many of the subsequent chapters, 
while providing admirable reviews of recent research in different aspects of cognitive psy- 
chology, demonstrate the continuing gap between the main interests of cognitive psychologists 
and those of educationalists. It was interesting to note that after a very different chapter 
by Berliner and Rosenshine and comments by Philip Jackson, the comments by the cognitive 
psychologists were tangential, if not irrelevant. There appeared to be little or no overlap 
in perspective. No doubt this reaction reflects the continuing gulf between pure and applied 
research, but it leaves a distinct uncertainty about whether cognitive psychology, with its 
apparently more direct links with education, can offer teachers any more of value than the 
earlier theories of the behaviourists. 


Nevertheless the authors of the concluding chapters—Gagné and Anderson—work hard 
to draw out for the reader the ideas which might have useful implications. All in all it is an 
important book, at an advanced level, which should influence the thinking of most psy- 
chologists interested in education. 

NOEL ENTWISTLE. 


Bruner, J. S., апа GARTON, А. (Eds.) (1978). Human Growth and Development. 
Oxford: Clarendon Press, pp. viii +167, p. Е2:25, с. £425. 


Complex societies generate complex problems and the rearing of the young has a 
special place among these. In this respect the accelerating growth of studies on child 
development is at once both heartening and yet also daunting. A cursory glance at the 
increasing size of journals can make the task of keeping informed of recent advances a 
seemingly formidable one. Human Growth and Development provides an excellent solution 
to this problem. The articles cover important topics in early development in a scholarly 
and eminently readable manner. Interest is focused on early social, cognitive, and linguistic 
development and the essentially interactive nature of the processes leading to the growth 
of competence. 


A major, pervasive theme is the importance of early social relationships. The articles 
by Hinde and Rutter are complementary, taking up similar issues related to different ques- 
tions. Relationships are rarely simple and the question of how one set of relationships 
can have consequent effects on other sets of relationships is explored by Hinde. His 
sophisticated investigations of the subtle interactions of primate mothers and infants have 
important implications for understanding the human infant's development of stable re- 
Jationships. 

Continuing this theme, Rutter asks the interesting question why many children experi- 
encing adverse conditions do manage to achieve emotional stability and social competence. 
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He considers attachment and types of bonding in relation to disruptive factors such as 
multiple hospitalisation and family discord. There appears to be a disparity between 
Rutter’s optimistic stance and his underlying message that children who encounter multiple 
stress factors suffer inflated risks. Rutter’s optimism is based on our growing, though still 
highly limited, knowledge of the factors which can facilitate normal development and 
reduce stress. Tizard, however, paints a somewhat depressing picture of how little research 
findings need count if economic and political policy finds them inconvenient. Tizard’s 
concern is with the quality of care that can and should be given in terms of nursery provision 
and needs. Adopting an energetic, practical approach he outlines, with great clarity, the 
way nursery expansion could best proceed. The issues raised relate closely to the needs of 
mothers and children, having particular relevance in the context of this book. 


The social interactions of mothers and their infants are shown by both Bruner and Clark 
to be intrinsically involved in language acquisition. They examine how prelinguistic 
requisites for linguistic reference stem from primitive gesturing and pointing behaviour. 
Bruner, in greater detail, unfolds an exciting progression, showing how joint attention and 
joint action between the mother and child form the basis of dialogue and naming. Clark, 
drawing generally on different sources, gives a more prosaic account, and examines the 
cognitive prerequisites facilitating deictic reference such as ‘ here’ and ‘ there’, ‘ this’ and 
‘that’. The problem of how prelinguistic concepts map onto linguistic reference is a thorny 
issue, discussed only by Bruner. Having eloquently dispensed with the strong notion of a 
Language Acquisition Device, Bruner does not reject some innate propensity for language 
acquisition which may assist the child, primed by his prior exeprience, to infer and generate 
rules about the meaning and structure of language. 


Clark sets her investigations in a context of continuity and strategy, extending her 
interest in the development of deictic reference well beyond prelinguistic experience. There 
is general support among contributors for the notion of continuity in development apart 
from Inhelder who replaces it with * reconstruction °. The main orientation of her article is, 
however, with strategies rather than Piaget’s constructivist approach which has more than a 
nodding obeisance in her introduction. The really interesting section illustrates new direc- 
tions in Genevan research. These reflect a change from using biological and mathematical 
models in studying broad, general development to adopting procedural models to investigate 
how and why children change their theories when solving problems. 


Overall, these papers provide illuminating insights into current research techniques and 
advances in knowledge. Psychologists, educationalists, and all with an intellectual interest 
in developmental processes will find them informative and stimulating. Human Growth and 
Development has another great virtue. It is modestly priced and attractively produced. 


MARGARET MARTLEW, 


Conen, С. (1977). Psychology of Cognition. London: Academic Press, pp. 241, 
£750. 


The Psychology of Cognition improves as it goes оп. Jt begins with a brash assertion of 
what semantic memory is without examples or evidence. Like me, you may happen to 
agree with the account given, but there are plenty of experts who do not wish to section off 
one part of memory and call it ‘semantic’. This partisan beginning did not maximise my 
confidence in what was to come. 


Cohen briefly discusses the empirical method of experimental testing, the rational 
method of logical analysis and computer simulation. She introduces Chomsky's * valuable ° 
distinction between competence and performance. I am not sure whether the use of the 
distinction in psychology truly parallels that in linguistics, but certainly it is difficult to 
identify the nature of any one part of the cognitive system because performance will be 
influenced by other parts. She recognises the difficulty of falsifying a competence model, 
but does not comment upon the problems this raises. 


Several current models of semantic memory are reviewed, with terminology that may 
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be obscure to the reader unfamiliar with the subject. The accounts are accurate and concise 
but perhaps too brief to do justice to either strengths or weaknesses. 


The book then steadily improves. There is quite a good review of visual imagery, 
although she is over-impressed by abstract description accounts. There is a clear review 
of problem solving, then two chapters on language and thought. These concentrate on 
chimps, the deaf, and those with Janguage disorders, providing valuable summaries. How- 
ever, the chapter on ‘ what is language’ has little about semantics or syntax. 


There is a chapter which is rightly critical of traditional studies of concept formation, 
a look at the problems of computer simulation, and a good review of research on hemisphere 
differences. This illustrates both the considerable evidence for differences and the problem 
of saying just what the differences really are. 


The final chapter discusses the state of cognitive psychology. Cohen reviews the 
criticisms of Newell, Allport, and Bruner without locating any attractive alternative to 
present methods. She does consider the problems raised by the lack of a theoretical frame- 
work in which to embed new research, but she does not mention the difficulties of the limited 
means of collecting data about the cognitive system, nor does she discuss the psychology 
of the psychologist which is essential to understanding what is and is not done. 


Cohen does not try to present a model of cognition, nor a framework into which the 
various sections will fit. This does mean that her reviews are not distorted by needing to 
force facts down the throats of fledgling theories. Even so, there should be some justification 
for the particular choice of topics. It is always difficult to define ‘ cognition’, but a book 
claiming to describe ‘ the psychology of cognition’ should have something to say about 
perception, episodic memory, conscious experience and the relationship between thought 
and action. 


There is little new in the book and I was disappointed after reading some chapters. 
However, even if Cohen cannot offer us solutions to the issues she raises her reviews may lure 
new minds to the problems. 

PETER MORRIS. 


Francis, H. (1978). Language in Teaching and Learning. London: Allen and Unwin, 
рр. 139, Е5:50. 


This book, comprising seven essays, is more than just another addition to the numerous 
books on language in education. Although it is intended primarily for teachers, social and 
educational psychologists as well as sociologists will also find it very useful. For teachers 
the book opens up a number of problematic areas in the field of language in teaching and 
learning, and focuses on how to understand the nature of these problems rather than offer 
aset of rules as to how to teach. The author’s point of departure that language is an essential 
part of human nature rather than an additional system of signs which the child comes to 
learn is relevant to the present effort of social psychologists to study the communicative and 
inter-subjective nature of language: the acquisition of language grows out of the child’s 
general knowledge about his social world. Knowledge itself is not a package of facts given 
to the child but a construction based on his previous experience. Sociologists may appreciate 
the author's effort to view and discuss this * previous experience * not just as class-based as 
has been traditional in this area but in the whole complexity of the child's cultural and social 
context. 


The presupposition that knowledge is constructed in the process of education rather 
than passively accepted by a child plays an essential role throughout the book. Both the 
pupil and the teacher bring into the educational process their previous experience and skills 
as well as expectations of each other's communication competence, perceptions of themselves 
and each other. Thus, it is in the two-way process of mutual interaction between pupil 
and teacher that the construction or modification of skill and knowledge takes place, that 
communication skills are extended or revised, and perceptions of the self and others altered. 
Such a conception of the process of constructing knowledge inevitably poses the question 
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as to the problem arising when children and teachers from different social and experiential 
backgrounds meet in the learning process. The learning history of each child is unique. The 
children have different abilities that are related to language learning; ways of life, parental 
values and expectations in a child’s upbringing are likely to affect his ways of talking and his 
responses to others. Differences between home and school lie not only in the selections of 
words and sentence forms, and in the registers adopted, but also in children’s preparedness 
for conformity and ability to fulfil expectancies of the formal education setting. Unless 
such differences between children’s homes are appreciated and acted upon by the teacher, 
the behaviour of the child who does not do what the teacher expects will be judged as un- 
favourable and the learning situation may be defined for the child by punishment without 
his understanding why. For the conformist who fulfils the teacher's expectations, on the 
other hand, it is rewarding. The message of the book is that a one-way control system 
between teacher and pupil should be replaced by techniques which encourage reflection, 
and by mutual testing out of thought and emotions in a real two-way communication system. 
The concept of appropriate guidance and adequate feedback information must be based on 
the fact that information given to one child may be encoded differently by another child. 


The author's ideas are illustrated by a great many examples taken from communication 
Situations between a child and his mother as well as by examples from a teacher-pupil setting. 
These make the book lively and interesting to read. 

IVANA MARKOVA. 


GILLHAM, B, (Ed.) (1978). Reconstructing Educational Psychology. London: Croom 
Helm, pp. 197, p. £4-50, h. £7-95. 


The editor of this set of papers is a university lecturer in educational psychology. He 
has collected nine other people around him who have produced eleven papers. Three of 
the writers work in universities and seven work in school psychological service. The work 
which they do is important because the book is a collection of personal views, supported 
on. occasion by citing authority, and approving of research, but seldom using its findings 
precisely. 

The one common feature is that all are dissatisfied with the role and working system of 
educational psychology but all advocate different ways of changing it and diagnose different 
symptoms and different diseases. 


Gillham, in analysing the patient (Educational Psychology) diagnoses excess pro- 
fessionalism, bureaucratisation, defensive in-group activity, resisting the outsider, and status- 
mongering. It would seem that a United Kingdom style Proposition 13 of the Californian 
scene may solve the problem, as it would force accountability into the open, which it appears 
could only be to the advantage of psychology. 


He also advocates a move from dealing with individual children to improving the whole 
system but accepts that this will put strains on relations with administrators in education 
and with advisers. He raises the spectre of education psychologists becoming like social 
workers, obliged to tackle innumerable problems which they cannot possibly alleviate. He 
is quoting Tizard here. He cites the current craze for innovation in education as a way of 
easing the anxiety of educators as they don’t know what to do. 


Dessent gives an essentially historical account to explain how the school psychological 
service began and developed, and how the Summerfield Report of 1968 helped this service. 
It is not surprising, but it is a lost opportunity, that comparisons were not made with Scotland 
which never had the problem of the medical as the director of the child guidance team. 
Subsequently he reports on interviews with three psychologists, all of whom have dis-ease 
about the service but offer different solutions. One wants to move more into the schools, 
not professing teaching skill but professing psychological skill, rightly claiming that psy- 
chologists should practise their own and not another profession. The second seems rather 
ashamed of the marbles-counter image of the psychologist and seeks a solution in child 
and in developmental psychology. The third seems to be advocating commonsense and that 
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efforts should be made to change the system and not the child. Two of those interviewed 
develop their credo at length in their own chapters. 


Roe attacks the medical model and reasonably asks why emotions and some behaviour 
are considered to be medical matters but learning and some other forms of behaviour are 
non-medical matters. While he agrees with Rutter that maladjustment is not a disease yet 
to class it as an abnormality is for him just the medical model in a weaker form. In place 
of it he prefers behaviourism іп a context supported by more rigorous definition of ‘ factors 7. 
The classic sin for him is to undermine the confidence of the clients and their parents. 


Hargreaves writes on labelling theory, which he considers to be more accurately described 
as the interactionist approach to deviance. Deviance itself is a property put upon forms of 
behaviour by those who observe it. From this he concludes that psychologists should work 
in schools, preferably on preventive work. 


When intelligence tests, and in this case personality tests, get their usual savaging as is 
the popular sport of the time, it seems that a better activity is positive assessment of learning 
problems which could well be done by the teacher. Also, when claiming that psychometrics 
has failed Gillham supports Goldstein in his criticism of Rasch modelling. 


Are there any solutions to these problems? Burden applies systems theory to educa- 
tional psychology training and ends up with CIPP which is hardly world-shattering. 


Later оп, in a chapter headed hopefully “Тһе Process of Re-construction : An Over- 
view " Bruner is cited as basing the failure of educational psychology on a basic flaw: “ the 
task was not really one of application in any obvious sense but of formulation." Oh, well, 
Burt had his chance and didn't Bradley get a lot of mileage from his * tragic flaw'. There 
must be a thesis in the 3 B's. 


There is also criticism of special school segregation related to tne danger of institutional 
forces which ‘run counter to the needs of the child’. This doctrine of need goes quite un- 
challenged in the book. Well 25 years ago segregation was the solution so, as it has not 
been a ‘success’, try the opposite. 


There is one gleam of truth that the current service could not cope with a 5 per cent 
demand from the school population. 


The tone of the book is of a profession full of discontent, doubting its role and doubting 
its value. If it is any consolation the same is happening with sociologists, curriculum 
developers, evaluators, educational technologists, statisticians, and many other groups 
dedicated to helping education. The group considered in this book can certainly identify 
their discontent but when the offering in the last chapter is, “ Тһе ground clearing of the 
last 5 to 10 years has made it possible for growth to occur at the level of both infrastructure 
and superstructure so there are unusual opportunities for development across age levels and 
hierarchies. Dare we take it? ”, they still have a long way to go to find a solution. 


J. G. Morris. 


Keats, J. A., Corus, К. F., and HALFORD, С. S. (1978), Cognitive Development: 
Research Based on a Neo-Piagetian Approach. Chichester: Wiley, pp. 458, 
£14-00. 


'This volume consists of thirteen chapters and an appendix by the editors and four of 
their colleagues who have all worked at some stage in the Department of Psychology, New- 
castle University, Australia. As Professor Lunzer points out in his foreword, the seven 
authors approach Piagetian theory from differing perspectives; each finds evidence to sup- 
port a stage theory of cognition, corresponding to Piaget's stages, but linked to an informa- 
tion processing analysis rather than a genetic epistemological model. 


Halford provides introductory and summary chapters as well as outlining a working . 
model of Piaget's stages in which he reports his own experimental data to support the 
characterisation of three stages corresponding to Piaget's pre-operational, concrete, and 
formal operational stages. The stages are seen as linked to a * limits to learning’ concept, 
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limits reflecting limitations in the complexity of symbolic systems which a child can con- 
struct at a given age, in turn linked to the hypothesised number of chunks of information 
available in short-term memory. 


Shepherd provides a full description of Piaget’s work on pre-operational to concrete 
operational stages with particular reference to number, measurement, sets, and shapes. A 
study by the author showing gains in conservation by the experimental groups is discussed 
in terms of consistency with Piagetian equilibration. In his second chapter Shepherd dis- 
cusses the mathematical concept of a groupoid—the combining of two elements to be 
replaced by a third in a closed mapping—as a replacement for Piaget’s much criticised 
* grouping °. 

Keats and Keats contribute two chapters on the role of language in the development of 
thinking, and the types of evidence relevant to the language and thought debate. They 
report two of their own studies which are discussed in terms ОҒ“ crucial testing ' of theories 
of the role of language, and provide support for a modified Piagetian viewpoint. 


Collie and Jurd each have two chapters related to the applications of stage theory to 
maths and history respectively. Both suggest teaching strategies derived from their experi- 
mental work; both would need simplifying and elaborating to be practicable as classroom 
practice. 


Seggie provides a lucid summary of Piaget's formal operational thinking and reviews 
related research; his own work linking a Bruner-type concept formation task to Piaget is 
mentioned but not described in detail. 


An appendix by Keats makes a rather belated attempt to apply the Rasch model to the 
stage theory outlined by the authors. This envisages an inappropriate application of the 
Rasch model to heterogenous dimensions, such as IQ. 


Overall the authors provide a well documented review of Piaget's ‘ cognitive ' stages— 
from pre-operational onwards, and related literature. Their own work occupies a relatively 
small portion of the text; this seems an unhappy balance since the reader would need to be 
familiar with developmental psychological theory, particularly the work of Piaget, to be in 
a position to evaluate the new perspectives suggested by the authors, but in that case would 
be familiar with much of the material reviewed. : 
LEA PEARSON. 


Lesser, H. (1977). Television and the Pre-School Child: A Psychological Theory of 
Instruction and Curriculum Development. London: Academic Press, pp. 261, 
£11:35. 


The problems associated with discussing the mass media are notorious. Despite heavy 
financial investment, especially in the United States, there are few clearly proved conclusions 
about the effect of television. Some might argue that it is impossible to arrive at any answers 
about the nature of the mass media and their effect on children. Others assume that the 
question is so obvious it hardly deserves discussion. 


One of the reasons for such disappointing results in research on the mass media is the 
fundamental dichotomy at the heart of most approaches to this subject. The researcher 
has a choice between analysing response and effect, or trying to work out pragmatic uses of 
the media as instruments of instruction. Dr. Lesser's book takes the latter approach and 
discusses various uses, or possible uses, of television as a medium of instruction, relating its 
concern especially to the famous American children's programme Sesame Street. The 
author, however, is too intelligent not to be aware of the alternative approach to the mass 
media, and his awareness sometimes gets in the way of his pragmatism. 


The problem with this book, as there is in the whole field, is that there is no clear theory 
of ‘ effect’. Research is often divided into either content analysis (Gerbner et al.) or into 
the study of child development usually based on Piagetian modes. There is very little 
thought for a sophisticated idea of effect and response. No version of Lasswell’s famous 
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formula, however complex, can help psychological understanding of children’s responses. 
The central area of effect is still untapped. 


Dr. Lesser is aware of the problem. Even he, in his quick survey of research on the 
nature of violence, points out that it is “ depressing to realise that the overwhelming body of 
research that purports to demonstrate a relationship between televised violence and violent 
behaviour is shackled to untenable theoretical and methodological considerations that render 
the research findings virtually useless as evidence in social policy considerations ” (p. 30). 
Having said that, Dr. Lesser turns his back on this problem and concentrates on pragmatic 
endeavours to give alternative television programmes. His excuse for the book is that he 
has based his pragmatism on careful summaries of other people's research, but the dichotomy 
between the analysis of the medium, and the use of the medium for instruction, still remains. 
Quite early in the book this shift towards prescription makes itself quite clear: ** The proper 
approach would be to design television programming so that it continually pushes the child 
toward more and more developmentally advanced forms of voluntary attention while utilis- 
ing the natural growth in the stability, scope, and productiveness of involuntary attention." 
Dr. Lesser is aware of the importance of involuntary attention but does not follow this up. 


This is because he assumes that child development on the Piagetian model can be 
taken for granted and applied. He summarises the research of Bandura and Berkowitz, 
notes their limitations, but then ignores the implications. He also finds himself analysing 
the content rather than effect of programmes. He is busy producing ‘ viable alternatives to 
Sesame Street’. The dichotomy of the book, as its subject matter, is that Dr. Lesser 
realises the importance of understanding the medium, but concentrates his attention on the 
message of educational programmes. His ideas therefore are comparatively simple and have 
more to do with surface content rather than a subtle understanding of children's responses. 


This book is not really about television and the pre-school child. Its sub-title, “A 
psychological theory of instruction and curriculum development ", describes it far more 
accurately. It is a mixture of analysis, suggestions, and different models, but never goes to 
the heart of the problem. He gives himself away by concluding (p. 239) that “ at some point, 
society will have to wrestle with its values and determine what role(s) television will play 
vis-à-vis children ". This telling appeal to * values ° and society shows that he is not himself 
satisfied with his own understanding of the effects of television on children. 


Given this limitation, the book is, nevertheless, readable and contains useful summaries. 
It is most effective in illustrating the difficulty of the subject. No amount of sociological 
analysis or Piagetian models can replace the need for a proper understanding of children's 
responses. 
Серкіс CULLINGFORD. 
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SCHOOL INFLUENCE AND PUPIL ATTITUDE 
‘TOWARDS RELIGION 


By L. FRANCIS 
(London Central YMCA) 


Summary. Attitudes towards religion of 2,272 third and fourth year junior pupils were 
studied in relation to the type of religious education provided by their school. Com- 
parisons were made between Roman Catholic aided, Church of England aided, and 
Local Education Authority schools. Comparisons were also made between schools 
which provide no religious education, schools which follow a ‘ traditional ’ syllabus, and 
schools which follow a * modern’ syllabus. It was found that there was no difference 
in pupil attitude in State schools which provide no religious education, State schools 
which follow the ‘ traditional’ syllabus, State schools which follow the ‘ modern’ 
syllabus, and Church of England Schools which follow the ‘ traditional’ syllabus. 
Pupils in Roman Catholic schools scored higher on the attitude scale and pupils in 
Church of England schools which followed the * modern ° syllabus scored lower. These 
findings question both the effectiveness of syllabus revision in religious education, and 
ni DOCU religious contribution of certain Church of England aided primary 
schools. 


INTRODUCTION 


THE churches continue to exercise a strong influence on the English system of schools 
in two ways. First, religious education remains a compulsory component of the 
school curriculum, and the churches play an important part in determining the 
interpretation of this component through their place on the committees which define 
the agreed syllabuses of religious instruction (Dent, 1947; Education Digest, 1977). 
Second, the churches maintain a high level of investment in maintaining voluntary 
aided schools in which denominational instruction is possible (Alexander and Taylor, 
1977). Little is known about the effect of these influences on children's attitudes 
towards religion and yet such information is important in evaluating the churches’ role 
in education. 


Within the English State system of education there are clear differences between 
three main types of primary school, Roman Catholic aided, Church of England aided 
and Local Education Authority provided. Within the Local Education Authority 
schools there are three main approaches to religious education. Some schools tend 
to ignore religious education altogether (Assistant Masters Association, 1977). Some 
schools tend to follow the * traditional ' form of agreed syllabus like Cambridgeshire 
(1949) and others tend to follow a more * modern ' form of syllabus like West Riding 
(1966). Church of England Schools also tend to follow these two different types of 
syllabus. 


The ‘traditional’ form of agreed syllabus was developed primarily from a 
consideration of the religious material to be covered during the period of compulsory 
schooling. These syllabuses are essentially Bible-centred. The material is organised 
according to theological consideration. Shape is given to the syllabus by tracing the 
history of the Judaeo-Christian religion through sections like * The Fall of the King- 
dom ’, “Тһе Exile’ and * Reconstruction in Palestine '. 


The * modern ’ form of syllabus was developed as a response to the psychological 
research of Goldman (1964, 1965). These syllabuses attempt to be child-centred. The 
materialis organised according to an understanding of the developmental stages 
through which the child supposedly matures. Shape i is given to the syllabus by themes . 
like * Wells and Water ’, * Sheep and Shepherds ' and * Christian Festivals °. 
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An important comparison can therefore be made between pupils educated 
according to six different environments of religious education: 
(1) Roman Catholic aided schools which employ a Roman Catholic syllabus; 
(2) Church of England aided schools which employ the ‘ traditional" syllabus; 
(3) Church of England aided schools which employ the ‘ modern’ syllabus; 
(4) Local Education Authority schools which employ the * traditional ' syllabus; 
(5) Local Education Authority schools which employ the * modern syllabus; 
(6) Local Education Authority schools which provide no religious education. 


It is reasonable to hypothesise that these different environments will be reflected 
in different pupil attitudes towards religion. Moreover, if these environments are 
achieving what they set out to achieve it should be possible to predict the nature of 
these differences in pupilattitudes. The churches hope that pupils in church primary 
schools possess a more favourable attitude towards religion than pupils in state 
Schools. The churches also hope that children in state primary schools which provide 
religious education possess a more favourable attitude than those in schools which do 
not provide religious education. Finally, the advocates of the new syllabuses of 
religious education hope that pupils taught according to these syllabuses record more 
favourable attitudes than pupils taught according to the ‘ traditional’ syllabuses. 
ore of these issues has been previously researched among English primary school 
children. 


Previous research has, however, demonstrated that pupil attitudes towards 
religion are related to a number of other variables. These variables are sex and age 
(Garrity, 1966; Povall, 1971), intelligence (Hyde, 1959), religious behaviour (Johnson, 
1966), parental religious behaviour (Greer, 1971), socio-economic background (Jones, 
1962), the geographical location of the school (Alves, 1968), and the co-educational or 
single sex status of the school (Wright and Cox, 1967). Since these variables may 
contaminate the influence of the school on pupil attitudes they must be taken into 
consideration in the research design. Reading age is also important as a factor 
affecting the child’s response to the questionnaire (Cookson, 1970). 


METHOD 
Sample 
Five co-educational junior schools representative of each of the six school types 
were selected from three counties in the South-East. The schools were considered to 
be comparable in terms of pupil intake. The attitude questionnaire was administered 
to all third and fourth year junior pupils in these schools. 


Measures 

(1) Scale of attitude toward religion Form ASC6 (see Appendix). This is a 24-item 
Likert type attitude scale developed by Francis (1976, 1978). The scale contains items 
concerned with the child’s attitude towards God, Jesus, the Bible, prayer, the church 
and religion in school. The 24 items were selected by means of item analysis from an 
original bank of 110 items. The scale is known to function reliably (alpha coefficient = 
:96) and validly among third and fourth year junior children. Associated with this 
attitude scale is a four-item Guttman type lie detection scale (coefficient of repro- 
ducibility = -92). 

(2) Scale of Religious Behaviour (see Appendix). This is a four item Guttman 
type scale also developed by Francis (1976) for use among primary school children on 
the basis of the scale proposed by Hyde (1959) (coefficient of reproducibility =-92). 


(3) Socio-economic Groups. The scale proposed by the Office of Population 
Censuses and Surveys (1970) is used in association with information about parental 
employment. 
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(4) Parental Religious Behaviour. Parental religious behaviour is measured in 
terms of parental church attendance. A score of 1 is given where one parent attends 
church at least once a month. A score of 2 is given where both parents do so. 


(5) IQ Rating. IQ grades were made available from the school records and 
collapsed into a scale of five categories according to the following transformation: 
IQ less than 86 =1 


86-95 =2 
96-105 =3 
106-115 =4 
Over 115 = $ 


Procedure 

The questionnaire was designed for group administration by the normal class 
teachers who were required to follow a standardised procedure. This method of 
administration was shown to produce minimal response falsification (Francis, 1979). 


Data Analysis 

The complex inter-relationships between the criterion variable of attitude towards 
religion and the predictor variables of age, sex, IQ, socio-economic group, religious 
behaviour, parental religious behaviour and school influence, require a multivariate 
system of data analysis. The data were analysed by means of linear multiple regression 
and path analysis using the SPSS package (Nie её al., 1975). Before it was considered 
appropriate to use this technique a careful empirical check was made on the extent to 
which the assumptions underlying this technique were violated by the data. Full 
details of this lengthy examination are provided in Francis (1977). As is inevitable 
with social science data all the assumptions were not strictly met, but the violation of 
these assumptions was not considered sufficient to invalidate the interpretation of the 
analysis (Bohrnstedt and Carter, 1971; Gardner, 1975). 


RESULTS 


А statement of the relationship between school influence and pupil attitude 
emerged through the refinement of a sequence of path models. These models were 
derived from the basic correlation matrix presented in Table 1 in terms of the Pearson 
Product Moment Correlation Coefficient. 


TABLE 1 
CORRELATION MATRIX (N VARIES BETWEEN 2272 AND 1676) 





School Soc. 
Year Sex Rel. B. Par. B. IQ СІ. 
Attitude towards religion = 074 “163 +:546 +341 + ‘033 — 026 
Social class --046 +014 ~ 081 ~ +183 ~ +300 
I — ‘059 -+026 +052 4:091 
Parental religious behaviour + 033 +7021 +483 
Religious behaviour ~ 044 +163 
Sex -:005 





Notes: 1. All correlations above 0-041 are statistically significant. 

. Sex is scored 0 — boys, 1 = girls. 

. School year is scored 3 —third year junior, 4 —fourth year junior. 
. IQ is scored so that 1 =low IQ, 5 —high IQ. 

. Social class is scored so that 1 professional classes. 
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Path Model One 

The three predictor variables which cannot be influenced by the other variables 
in the equation are sex, school year and social class. These three variables are shown 
by the correlation matrix to be completely independent. They therefore become the 
point of origin from which the first path model is constructed (Figure 1). In this 
model the four criteria variables are significantly intercorrelated. Тһе path model 
makes no assumption regarding the direction of causality in that intercorrelation. 
Interactions between the predictor variables were not significant and have therefore 
been omitted from the path diagram. 


Table 2 summarises the nature of the prediction which is facilitated by the first 
path model. Although seven paths are statistically significant, clearly little predictive 
power is associated with the model. Social class emerges as the strongest predictor of 
IQ, accounting for 9-3 per cent of the score variance. Similarly social class is the 
strongest predictor of parental religious behaviour, accounting for 3:3 per cent of the 
Score variance. Sex emerges as the strongest predictor of both religious behaviour 
and attitude towards religion, accounting for 2:1 per cent of the variance in the 
behaviour scores and 2-7 per cent of the variance in the attitude scores. 


Path Model Two 

Path model 2 assumes that both religious behaviour and attitude towards 
religion among third and fourth year junior children are dependent upon parental 
religious behaviour and not vice versa (Figure 2). Again interaction terms were not 
significant and have been omitted from the diagram. From Table 3 it can be seen that 
parental religious behaviour is an important factor in predicting scores both in attitude 
towards religion (12-4 per cent of score variance) and religious behaviour (24-7 per cent 
of score variance). The point of major interest to emerge from path model 2 is that 
the direct path between social class and religious behaviour which is present in path 
model 1, no longer appears to be statistically significant. The apparent influence of 
social class on the child's level of religious behaviour is a function of the direct 
influence of social class on parental religious behaviour and of the direct influence of 
parental religious behaviour on the child's religious behaviour. 


Path Model Three 

In path model 2 IQ and religious behaviour are positively correlated. Path model 
3 assumes that IQ is more likely to have a causal influence on religious behaviour than 
vice versa (Figure 3). Once again interaction terms were not significant and have been 
omitted from the diagram. From Table 4 it can be seen that according to this model, 
IQ contributes no additional predictive power above that contributed by the other 
variables in the equation. The correlation between IQ and religious behaviour shown 
in path model 2 is now best explained as a function of the direct causal influence of 
Social class upon IQ and the indirect influence of social class upon religious behaviour 
mediated through parental religious behaviour. 


Path Madel Four 

In path model 3 attitude toward religion and religious behaviour are positively 
correlated. Path model 4 assumes that the causal influence in the relationship is in the 
direction of behaviour influencing attitude. For the purposes of this study the 
prediction of pupil attitude is the primary object of interest. Path model4 is presented 
in Figure 4. In this case the 15 two-way interaction terms among the predictor 
variables are just statistically significant (increase in R? —:01421, F(15,1620) — 2-021, 
P<-05). Inreal terms all 15 of the interaction terms account for such a small propor- 
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tion of the score variance that it seemed legitimate to ignore them. Table 5 demon- 
strates that religious behaviour is in fact the most powerful predictor of scores of 
attitude toward religion, accounting for 32 per cent of the score variance. The 
significant feature to emerge from this model is the way in which parental religious 
behaviour now only accounts for 0:7 per cent of the variance in attitude scores, whereas 
in model 3 it had accounted for 12:4 per cent. This observation suggests that the 
influence of parental religious behaviour upon the child’s attitude towards religion is 
largely an indirect influence. Parental religious behaviour influences the child’s 
religious behaviour and, in turn, the child’s religious behaviour influences his or her 
attitude towards religion. 


Path Model Five 

Path model 4 summarises the influence of age, sex, social class, IQ, parental 
religious behaviour and the child's own religious behaviour upon reported attitudes 
towards religion. These are the variables which are likely to confound the direct 
observation of school influence upon pupil attitudes. Path model 5 is now able to 
enter into the equation, as the final block in blockwise multiple regression, the variable 
which is of central importance to the present project, namely the type of school 
approach to religious education (Figure 5). This variable is entered into the regression 
equation through the creation of a set of dummy variables (Cohen, 1968). In this 
system of dummy variables Local Education Authority schools using the ‘ traditional ' 
form of syllabus were used as the reference point. There are no very significant two 
way interactions among the predictor variables in this path model. Table 6 demon- 
strates that only two styles of school contribute further significant predictive power to 
the equation, over and above the information conveyed by the variables already taken 
into account in path model 4. The two styles of school which depart significantly 
from the reference point are the Roman Catholic schools and the Church of England 
schools employing the ‘modern’ form of syllabus. The Roman Catholic schools 
explained an extra 1-4 per cent of the score variance on the scale of attitude towards 
religion. The Church of England schools employing the * modern ' form of syllabus 
explained an extra 0-4 per cent of the score variance. The regression coefficient 
indicates that the predicted attitude scores for children in Roman Catholic schools 
would be a little over 6 points higher on the 96-point attitude scale than for children 
in the other schools. Similarly the predicted attitude scores for children in Church of 
England schools employing the * modern ’ form of syllabus would be little more than 
3 points lower than children in the other schools on the same scale. 


CONCLUSION ео oe 


Two important conclusions emerge from this study regarding the effectiveness of 
the churches’ involvement in education. First, regarding the use of agreed syllabuses 
in Local Education Authority schools, it was shown that there was no difference in 
attitude scores between children taught in schools which employ the ‘ traditional’ 
syllabus, children taught in schools which employ the * modern ' syllabus and children 
taught in schools which provide no religious education. The agreed syllabus teaching 
appears to be having no effect on pupil attitudes. Second, regarding the provision of 
church aided schools, it was shown that children taught in Roman Catholic schools 
were reporting a more favourable attitude towards religion than children taught in 
other schools. Roman Catholic aided schools seem to be achieving their desired 
effect. Children taught in Church of England schools either reported no difference in 
attitude scores from children taught in state schools, or, in the case of those which 
employ the ‘ modern’ syllabus, a lower attitude score. Church of England aided 
schools seem to be having either no effect or a negative effect on their pupils’ attitudes 
towards religion. 
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APPENDIX 
The twenty-four item scale of Attitude Towards Religion including the four item lie scale. 


1. I find it boring to listen to the Bible 
2. I know that Jesus helps me 

3. Saying my prayers helps me a lot 

4. The Church is very important to me 
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5. Going to church is a waste of my time 
6. I always say my prayers every night* 
7. Y want to love Jesus 
8. I think church services are boring 
9, T think people who pray are stupid 
10. God helps me to lead a better life 
11. І like school lessons about God very much 
12. I want to go to church every day* 
13. God means a lot to me 
14. I believe that God helps people 
15, Prayer helps me a lot 
16. I know that Jesus is very close to me 
17. I think praying is a good thing 
18, I think the Bible is out of date 
19. I believe that God listens to prayers 
20. I think about God all the time* 
21. Jesus doesn't mean anything to me 
22. God is very real to me 
23. Saying prayers in school does me no good 
24. I like going to church more than playing games* 
25. The idea of God means much to me 
26, I believe that Jesus still helps people 
27. I know that God helps me 
28. I find it hard to believe in God 


* 'The four items of the lie scale 


The four item scale of Religious Behaviour. 


J. Do you go to Sunday School or Church Youth Club most every week? 
2. Do you go to church once a week? 

3. Do you go to church at least once a month? 

4. Do you say your prayers by yourself at least once a week? 
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ATTITUDES TOWARDS THE SCHOOL, THE TEACHER, 
AND CLASSMATES AT THE CLASS AND INDIVIDUAL 
LEVEL 


By J.-E. GUSTAFSSON 
(Institute of Education, University of Gothenburg) 


Summary. The responses of the pupils in 60 6th grade classes to a 40-item questionnaire 
assessing attitudes towards the school, the teacher, and the classmates are factor analysed 
at two levels of aggregation, classes and pupils-within-classes. The same factors were 
found at both levels but they accounted for different amounts of variance: at the class 
level factors reflecting attitudes towards the teacher and characteristics of the class as a 
whole were strong, while at the individual level factors reflecting attitudes towards the 
school and the individual pupil’s relations to the classmates were strong. Relations 
between personality variables and the attitude scales are also studied at the two levels 
of aggregation and implications of the results for the measurement of personality are 
discussed. 


INTRODUCTION 


Wir few exceptions, statistical analyses of educational research studies are based on 
the individual pupils’ scores. However, most educational processes do take place 
with the pupils organised into classes; thus the pupils are not independent units of 
observation but they do have a more or less common history of experience. It has 
been argued (e.g. Peckham ef al., 1969) that when classes are sampled, class means 
should be analysed instead of pupils’ scores. However, Cronbach (1976) claimed that 
neither analyses at the individual level nor analyses at the class level yield a sufficiently 
complete picture; instead the hierarchical nature of the observations should be 
clarified and the individual scores decomposed into components for different levels of 
aggregation, to obtain separate estimates for different levels. It was shown, both 
theoretically and in empirical examples, that patterns of results from regression 
analysis, covariance analysis, and multivariate analysis may be drastically different at 
different levels of aggregation. 


The present paper presents within-class and between-class factor analyses of a 
questionnaire designed for assessing attitudes towards the school, the teacher, and 
classmates. Some or all of these aspects can be suspected to be sensitive to variation 
between classes; an ordinary factor analysis based on the individual pupils’ scores is 
therefore likely to fail to reflect the true dimensionality of the responses. 


The paper has both a methodological and a substantive purpose and, to add to 
both of these, relations between the dimensions established in the factor analysis and 
personality variables will be studied. 


METHOD 
Instruments 
The attitude questionnaire was originally constructed by Johannesson (1960), and 
called Our Class. Here, however, a somewhat shortened version, developed within 
the DPA-project (Didactical Process Analysis, Bredánge et al., 1971), has been used. 
The questionnaire will be referred to as the SAW questionnaire (literally from the 
Swedish it means School And We). 


The SAW contains 40 questions or assertions, each of which is to be answered 
through circling one of the five alternatives always, often, sometimes, seldom and never. 
There are both positive and negative assertions, but the responses were coded in such 
a way that a higher code throughout represents a more positive attitude. 
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When the SAW was constructed the questions were classified into three groups 
to yield scores on three different scales: one assessing attitude towards the school (18 
items—7 positive and 1] negative); one measuring attitude towards the teacher (9 
items, all positive); and one measuring attitude towards classmates (13 items—6 
positive and 7 negative). The grouping was validated on the basis of measures of 
internal consistency, but the dimensionality of the questionnaire has never been 
investigated with factor analysis. 


Pupil personality was measured with a translation into Swedish of the High 
School Personality Questionnaire (Cattell et al, 1957). However, the 13 dimensions 
purportedly measured by the HSPQ could not be replicated in factor analyses of the 
version used; therefore the items were reorganised into three scales. The scales could 
be interpreted as measuring Introversion, Impulsivity, and Emotional Stability. The 
Introversion scale consists of 12 items, most of which measure the non-sociable aspect 
of introversion. The impulsivity scale contains 16 items, most of which reflect 
adventurousness and weak superego control. The Stability scale consists of 18 items 
asking about the tendency not to get emotionally upset and nervous. 


Sample 
The data analysed here were originally collected within the DPA-project 
(Bredánge et al., 1971), and are here only used for secondary analyses. 


The DPA-project comprised 60 classes in grade 6, with in all 1601 pupils. It was, 
however, impossible to obtain complete data from all the pupils. The SAW question- 
naire was answered by 1,435 pupils, which is the number of pupils on which the factor 
analyses are based. Class sizes varied between 17 and 30. 


The analysis of the relationship between the HSPQ scales and the SAW will, 
however, be based on a somewhat lower number of pupils; 1,319 pupils answered both 
these questionnaires. The reason for this additional attrition of the group is that the 
questionnaires were administered at two different occasions. 


Statistical analysis 

The data will be analysed at two levels of aggregation, pupils-within-classes and 
between classes. A third level, school, could in principle be recognised in the data, 
but since only few schools were represented with more than one class, analyses taking 
into account this third level would not be informative. 


Cronbach (1976) recommends that in two-level analyses, each pupil’s score is 
transformed into two components: the class mean and the deviation of the raw score 
from the class mean. Covariance matrices are then computed, using the class means 
for the between-class covariance matrix, and the deviation scores for the within-class 
matrix; in both cases, however, with the total number of pupils as the number of 
observations. This means that the estimates are weighted in relation to the number 
of pupils, and the ordinary pooled covariance matrix is obtained as the sum of the 
between-class and within-class covariance matrices. 


An estimate of the intraclass correlation for a variable can be obtained through 
forming the variance ratio for the between-class variance and the total variance. This 
estimate is biased, since it does take into account the variance between classes resulting 
from random assignment of pupils to classes (Harnqvist, 1978). However, Cronbach 
(1976) argued that pupils within classes must be considered as fixed; each class has a 
unique history and therefore it is not reasonable to speculate about the possible 
results of another particular assignment of pupils. 

The factor analyses will be based on scale-free covariance matrices, in which each 
element in the covariance matrices for the within- and between-class levels is divided 
with the product of the standard deviations, at the pooled level, for the pair of variables 
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involved. The diagonal contains for the between-class level the intraclass correlations 
and for the within-class level, 1 minus the intraclass correlation. These two matrices 
sum to the ordinary correlation matrix for the pooled data. (Cf. Cronbach, 1976; 
Hárnqvist, 1978.) Thus the total variance factor analysed at the between-class level 
is the sum of the intraclass correlations, and the remainder of the variance is analysed 
at the within-class level. 


The EFAP program of Jóreskog and Sórbom (1976) was used to compute 
maximum likelihood solutions, at the within-class and between-class levels, under 
different assumptions as to the number of factors, and these solutions were followed 
by Varimax rotations. The maximum likelihood approach allows a statistical test of 
the hypothesis that a certain number of common factors is sufficient to account for 
the structure of correlations. Considering, among other things, the problems in 
determining the actual degrees of freedom, however, little importance was attached to 
these statistical tests. 


RESULTS 
Results will be presented from five-factor solutions, since the factors in those 
solutions seemed to reflect dimensions of interest from a substantive point of view. 
Table 1 presents the items with high loadings in the five factors in the between-class 
and within-class analyses, along with the intraclass correlations. Rather surpris- 


TABLE 1 


Tue ITEMS IN THE SAW QUESTIONNAIRE WITH HIGH LOADINGS IN THE TWO-LEVEL FACTOR-ANALYSES. 
Factor LOADINGS SHOWN ARE VARIMAX-ROTATED LOADINGS FROM MAXIMUM LIKELIHOOD SOLUTIONS 





Loading 
Within Between Intraclass 
Item classes classes correlation 
Factor 1 
5. It is fun to go to school 73 27 11 
9, It is boring to go to school “73 29 ‘ll 
25. I find the lessons boring “67 29 13 
22. Work at school is dull and monotonous ‘67 29 12 
18. I think that the lessons at school pass slowly 58 25 10 
6. I think that the work іп lessons is fun *61 21 09 
29. I want to leave school earlier in the day +53 21 -09 
30. I think that lessons at school pass quickly -50 -20 07 
4. In the mornings I want to stay home from 
school 50 18 08 
15. In our class the lessons are fun and 
interesting “48 19 415 
21. It would be more fun if we were allowed to 
do what we want in the lessons 52 “23 11 
11. It would be better to have a job than go to 
school “47 18 -09 
37. Work at school is good and has variety 55 19 12 
Factor 2 
33. Our teacher is nice and kind 56 53 +36 
17. Our teacher is calm and good tempered "54 -48 30 
35, Our teacher keeps promises ‹51 41 27 
38. Our teacher treats all pupils alike ‘47 37 23 
10. Our teacher listens to our questions 48 :32 :19 


13. Our teacher helps us a lot AT 31 15 
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TABLE 1 (contd) 














Loading 
Within Between Intraclass 
Item classes classes correlation 

Factor 3 
23. My classmates tell me off “64 22 %7 
19. I have enemies among my classmates during 

the breaks 55 13 ‘07 
24. I feel sad and lonely at school “54 “13 06 
20. All my classmates are kind to me 52 11 :07 
Factor 4 
26. In our class all the pupils are good friends +58 28 1 
27. In our class the pupils help each other ` "49 29 14 
20. All my classmates are kind to me 38 7 ‘07 
39. In our class we stay together during the breaks "34 25 13 
32. In our class the pupils quarrel with each other 28 :23 12 
Factor 5 
16. In our class we don't do much school work 43 22 13 

7. During the lessons we are calm and quiet 27 36 26 

14. During the breaks the pupils in our class 

fight 43 13 13 
32. In our class the pupils quarrel with each other :37 :19 12 
40. In our class we do exactly as the teacher says "23 24 18 


ingly, the same items tended to load highly at both levels. In the few cases where 
an item loaded highly in the analysis at one level only it has been included in Table 1 
anyway. 

Factor 1 loads highly on items measuring attitude towards the school and 
schoolwork and it will be called the School factor. The loadings in the within-class 
analysis generally are two to three times as large as those in the between-class analysis, 
but the factor is clearly established at both levels. 


Factor 2 is defined by items measuring attitudes to, and perceptions of, the 
teacher, and it will be called the Teacher factor. The loadings in the between-class 
analysis tend to be almost as large as those in the within-class analysis, and there are 
sizable intraclass correlations for most of the items with a high loading in this factor. 
Each class had a different teacher so a large between-class variance is to be expected, 
but it is also interesting to note that there are systematic individual differences within 
the classes in the perception of the teacher. 


Factor 3 loads highly on a group of items asking about the individual pupil's 
relations to classmates, and it will be labelled Relations to Classmates. The loadings 
at the within-class level are three to four times as large as those at the between-class 
level, and the items loading this factor have the lowest intraclass correlations. 


Factor 4 is also defined by items asking about social relations; in contrast with 
factor 3, however, these items refer to social relations within the class as a whole. The 
factor will be referred to as the Class Relations factor. The loadings at the within-class 
. level are two times, or less, as large as the loadings at the between-class level, and the 
intraclass correlations are of an intermediate size. 


Factor 5, finally, loads highly on items referring to the behaviour or discipline of 
the class, and the factor will be called the Class Discipline factor. Both the intraclass 
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correlations and the loadings in the between-class analysis vary greatly for the items 
loading this factor, but for some items they are sizable. 


The factor analysis thus shows that the items originally classified as measuring 
attitudes towards the school and the teacher each form a separate factor. However, 
the items constructed to measure attitudes towards classmates in fact form three 
factors: one reflecting the individual pupil’s relations to classmates, one reflecting 
social relations within the class, and one reflecting the discipline of the class. 


These five factors could be identified at both levels, even though the relative size 
of the loadings at the within-class level and the between-class level is different for the 
factors. This appears in greater detail from Table 2 where the amount of variance 
explained by the factors at the two levels is shown. 


TABLE 2 


CONTRIBUTIONS OF VARIANCE BY THE FACTORS IN THE WITHIN- AND 
BETWEEN-CLASS ANALYSES 





Within classes Between classes 
Factor Amount per cent* Amount per cent* 

School 5:3] 15 1-06 20 
Teacher 2:75 8 1-77 33 
Relations to classmates 1:73 5 0:18 3 
Class relations 1:32 4 0-44 8 
Class discipline 0-94 3 0:43 8 
Total 12:05 35 3:88 72 


+ The percentages have been computed from the total amounts of variance апа- 
lysed, 34-6 and 5:4 for the within- and between-class analyses, respectively. 


In the between-class analysis the Teacher factor accounts for most variance, then 
follows the School factor, the Class Discipline factor, the Class Relations factor and 
the Relations to Classmates factor. In the within-class analysis the School factor is 
the one accounting for most variance, followed by the Teacher factor, the Relations 
to Classmates factor, the Class Relations factor and the Class Discipline factor. Thus 
even though it is possible to find the same factors at both levels of the hierarchical 
analysis, it is obvious that different sources of variance influence the factors differently. 


To study correlations, at the within-class level and the between-class level, 
between the personality variables and the attitude factors, five scales were constructed 
through assigning each item in the SAW questionnaire to the factor on which it loaded 
highest. Statistical characteristics of the scales are presented in Table 3. As can be 
expected the intraclass correlations vary greatly between the scales, with the Teacher 
scale having the largest intraclass correlation and the Relations to Classmates scale 
having the lowest intraclass correlation. The personality variables, characteristics 
of which have also been entered in Table 3, tend to have the lowest intraclass corre- 
lations; as is evident from the F-ratios from one-way analyses of variance with class 
as the factor they are significant, however, for both Introversion and Impulsivity. 


Table 4 presents correlations between the SAW scales and the personality variables 
at the two levels. The correlations have been computed from between-class and 
within-class covariance matrices which were standardised to have unities in the 
diagonal. 


Impulsivity is the personality variable which accounts for most variance in the 
SAW scales. In the within-class analysis there are rather strong negative correlations 
with the School and Teacher attitude scales; this does not to the same extent hold true 
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TABLE 3 
CHARACTERISTICS OF THE SCALES MEASURING ATTITUDE AND PERSONALITY 


Intraclass 
Scale No. of items correlation F-ratio* 

Attitude variables 

School 15 16 4-24 
Teacher 12 "32 11:21 
Relations to classmates 6 08 1:91 
Class relations 3 17 3-85 
Class discipline 4 22 6-93 
Personality variables 

Introversion 14 06 1-42 
Impulsivity 16 ‘07 1:51 
Stability 18 05 1:25 


* The F-ratios are computed in one-way analyses of variance with class as factor, critical value 
F.95 (59, оо) ^ 1:39. 


TABLE 4 


CORRELATIONS BETWEEN THE PERSONALITY VARIABLES AND THE ATTITUDE 
VARIABLES AT WITHIN- AND BETWEEN-CLASS LEVELS 


Introversion Impulsivity Stability 
Within Between Within Between Within Between 
School —:04 06 —:38 --16 — :01 — 05 
Teacher "03 "06 --32 --04 “01 --03 
Relations to classmates -:27 —:29 —:10 — +23 28 28 
Class relations —:15 -:28 -:11 --22 :09 -:10 


Class discipline 02 ‘07 --21 —:34 10 -:16 


at the class level, so had pooled correlations been computed instead, weaker correla- 
tions would have been found with Impulsivity. In the between-class analysis there is 
a rather high negative correlation between Impulsivity and the Class discipline scale. 


The other personality variables yield few correlations worth mentioning. It can 
be observed, however, that Introversion both at the class level and at the individual 
level is negatively correlated with the Relations to Classmates scale and that there, at 
the class level, is a negative correlation between Introversion and the Social Relations 
scale. Stability is positively correlated with Relations to Classmates, both in the 
between-class analysis and in the within-class analysis. 


DISCUSSION AND CONCLUSIONS 


The factor analyses at the two levels of aggregation resulted in the same factors 
at both levels, in spite of the fact that no constraints, other than those dictated by the 
number of factors and the Varimax criterion, were imposed. An empirical example 
that the same factors are not necessarily found at both levels in this kind of analysis 
is given by Harnqvist (1978). 


But it was found that the two levels contribute differently to the factor variance. 
There are factors mainly influenced by differences between pupils within classes such 
as the Relations to Classmates factors and the School factor, but there are also factors 
heavily influenced by differences between classes, such as the Teacher factor and the 
Class Discipline factor. In an ordinary factor analysis, disregarding the hierarchical 
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nature of the observations, it would necessarily have been assumed that only differ- 
ences between pupils are reflected in the factor variance. 


It appears that the most interesting pattern of correlations with the personality 
variables is found at the class level. Impulsivity is correlated with Class Discipline, 
and Introversion with Social Relations and Relations to Classmates. It is quite 
interesting that personality variables, which are usually taken as purely individual 
measures, do show interpretable correlations also at a higher level of aggregation. 


Two interpretations can be suggested to account for this. In the first place it is 
clear that even if assignment to classes is random there will be differences between the 
classes with respect to personality and it is possible that even slight differences in the 
composition of the classes can have important effects on the flow of events in the class. 
But, secondly, it must also be pointed out that both questionnaires were answered 
while the pupils attended the same class; therefore the measurement of personality is 
not independent of class. For example, if a teacher places only little weight on the 
social relations within the class, this is also likely to result in a higher mean of the 
class on the introversion scale. 


Since in this case both questionnaires were answered while the pupils attended the 
same class it is not possible to decide which of these interpretations of the correlations 
at the class level is the correct one. However, since there were significant intraclass 
correlations also for the personality variables it does seem that there may be an effect 
of class membership on the measurement of personality. 


The 130 items in the HSPQ have been analysed for class effects. There were only 
few significant intraclass correlations, but for a group of items with specific reference 
to the school and the class, significant intraclass correlations were found. The 
conclusion that such items with a specific situational reference are sensitive to class 
effects does have implications for the measurement of personality. 


Bennett and Youngman (1973, cf. Bennett, 1973) criticised the Junior Eysenck 
Personality Inventory for asking questions framed in too general terms and they 
claimed that “ In the school setting it seems likely that institutional demands are 
sufficiently strong to swamp the effects of individual differences in personality. In 
such a situation a general inventory like the ТЕРІ is of limited utility and validity. . . . 
It would, therefore, seem more useful to design inventories which have a clear meaning 
in the particular situations encountered ” (Bennett and Youngman, 1973, p. 233). 
The rationale behind this suggestion is sound enough, but if proper account is not 
taken of the class effects which are likely to result from such an approach this may 
create more severe problems in educational research than those caused by attempts to 
measure personality without clear reference to context. Since classes are different, 
references to specific situations will have different meaning for the pupils in different 
classes, and such differences will enter systematically into the responses. If such data 
are analysed as individual data there is a great risk that spurious relationships will be 
found between the personality variables and other variables, such as achievement and 
attitude variables. 


To guard against such spurious relationships, and also to study relationships at 
the class level in their own right, the methodology exemplified in this paper does seem 
well suited. 
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PREDICTIVE VALIDITY OF A SCREENING BATTERY 
FOR CHILDREN ' AT RISK’ FOR READING FAILURE 
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Summary. A three-year follow-up of a sample of 407 white boys entering urban 
Catholic schools in Perth is reported. The children were given a series of eight tests 
previously shown to predict subsequent reading performance. Using discriminant 
analysis, it was shown that 75 per cent of children identified as having a high risk of 
reading failure showed severe or mild symptoms of such failure in terms of reading 
comprehension in the third year of schooling. 


INTRODUCTION 


SATZ and colleagues, at the University of Florida, have developed a screening battery 
which is designed to detect kindergarten age children who will, in later years, become 
disabled, average or superior readers (Satz and Friel, 1973; Satz and Friel, 1974; 
Satz, Friel and Rudegeair, 1976; Satz, Taylor, Friel and Fletcher, 1978). The test 
battery was standardised on the total population of white boys (М = 497) who started 
kindergarten in 1970 in Alachua County, Florida. This population was comprised of 
fourteen urban and six rural elementary schools. The battery, given during early 
kindergarten, was later validated against independent reading criteria at the end of 
Grades I (1972), H (1973), I (1974) and V (1976). Careful tracking procedures had 
kept the attrition rate very low—approximately 90 per cent of the original population 
were followed through all five grades. Separate cross-validation studies of the battery 
were completed on a sample of 181 white boys who began kindergarten in 1971 and 
whose reading scores were assessed three years later—at the end of second grade in 
1974; and on a mixed sample of kindergarten children (boys, girls, blacks and whites, 
N =150) tested іп 1974 with end of second grade follow-up in 1977. The latter cross- 
validation was based on the weights derived from the two standardisation populations 
of white boys. The results of these validation studies revealed that the tests given 
during kindergarten consistently identified over 90 per cent of the children destined to 
become severely disabled or superior readers in later years. Predictive classification, 
however, was lower for those children who fell in the mid range of the reading 
distribution—the mildly disabled and average groups (Satz and Friel, 1978). 


The purpose of the present study, undertaken by the Dyslexia Research Founda- 
tion and the Catholic Education Commission of Western Australia, was to determine 
whether the battery, given at the beginning of first grade (mean age = 68 months) could 
predict reading group membership at the end of the third grade on a sample of white 
Australian boys. . 


An additional purpose of this study was to institute a series of first-grade inter- 
vention programmes for the predicted high-risk children, should the predictions be 
of satisfactory validity after the first year criterion testing. 


METHOD 
Sample 
The sample consisted of 407 white boys. This group represented all of the boys 
who entered Year 1 in February, 1975, in 23 urban Catholic schools (mean age 68:2 
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months), excepting aboriginal male boys and a few children (recent immigrants) who 
could not speak nor understand English. The schools were selected to give a repre- 
sentative sample of the socio-economic distribution of Catholic schools in the Perth 
metropolitan area. 


Predictive tests 

The test battery consisted of eight tests and one non-test variable (socio-economic 
status). These tests were selected from a larger standard battery based on stepwise 
regression procedures across follow-up years. 


(J) Alphabet Recitation. Verbal recitation of A, B, C... Z. Score: number of letters 
named, regardless of order. 


(2) Peabody Picture Vocabulary Test (Dunn, 1959). Score: intelligence quotient. 
(3) Recognition-Discrimination (Small, 1968). A visual-perceptual task requiring child 
to identify a geometric stimulus design among a group of four figures, three of which were 


rotated and/or similar in shape to the stimulus figure (15 cards). Score: number of correct 
responses. 


(4) Finger Localisation (Benton, 1956). Somatosensory test consisting of five levels of 
performance, four of which (1, 2, 4 and 5) are presumed to assess increasing levels of com- 


plexity. 

(i) Shielded unilateral stimulations made to the fingertips; shield removed between 
stimulations and child required to point to the finger touched with the index finger 
of his free hand. Five trials per hand, starting with preferred hand. 

(ii) Shielded unilateral stimulation made to the fingertips; child identified each 
stimulated finger on a corresponding diagram of an opened hand. Five trials per 
hand, starting with preferred hand. 

(ii) Shielded, randomised series of three bilateral and 10 unilateral stimulations made 
to m backs of child's hands; child waved hand(s) stimulated. Only bilateral trials 
scored. 

(iv) Shielded unilatera! stimulations made to the finger tips; child recalled the number 
of the finger stimulated (following prior teaching of finger numbers). Five trials per 
hand, starting with preferred hand. 

(vy) Shielded simultaneous bilateral stimulations made to pairs of finger tips; child 
recalled the number of the finger stimulated on each hand. Five pairs of stimula- 
tions. Score: number correct across all five levels. 

(5) Developmental Test of Visual-Motor Integration (Beery, 1967). Score: visual-motor 
‘age’ less chronological age. 

(6) Dichotic Listening (Satz, 1968). Measure of ear asymmetry in which child is 
presented with disparate pairs of numbers arriving simultaneously via stereo headphones 
every half-second. Child is required to recall number heard. Scores: total recall from both 
the right and left channels, and an ear asymmetry measure derived from the ratio (RC-LC)/ 
(RC -- LC). 

(7) Auditory Discrimination. Shortened, taped version of the Wepman Auditory Dis- 
crimination Test (Wepman, 1958). Child is required to recognise on 20 trials whether pairs 
of words heard through earphones were the same (a single word repeated) or different (two 
different, but similar sounding words) Score: sum of the ratio of correct * same ' responses 
and ratio of correct * different ’ responses to total number of * different ’ responses. 

(8) Socio-economic Status. Classification into above-average, average and below- 
average based on father's occupation and teachers’ ratings into the above three categories. 


In the Florida studies, factor analysis of the standard battery has identified three 
major factors that comprised measures of sensori-motor-perceptual skill (Factor I), 
verbal-conceptual skill (Factor П), and verbal-cultural experience (Factor III). Each 
of these factors is represented in the test battery: Finger localisation (1), Beery 
Developmental Test of Visual-Motor Integration (1), Recognition-Discrimination (1), 
Peabody Picture Vocabulary Test (II), Dichotic Digit Recall (IT, Wepman Auditory 
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Discrimination Test (III), Alphabet Recitation (III) and Socio-economic Status (ПТ). 
Factor I tests have consistently accounted for most of the predictive variance (Satz 
et al., 1978). 


Procedure 

The children were tested individually within their own schools by experienced 
educational or clinical psychologists, using the first five tests of the Satz Predictive 
Test Battery. АП testing took place within the first six weeks of the commencement 
of the school year. 


Immediately following, the Dichotic Listening and Auditory Discrimination tests 
were administered to all children by four final-year psychology students specially 
trained in the procedures and management of the specialised recording equipment 
used. 


The completed test protocols were sent to the Neuropsychology Laboratory of 
the University of Florida, where they were analysed by SPSS programme DIS- 
CRIMINANT, in which each test score is multiplied by the generated coefficient for 
each significant discriminant function, giving a combined score for each function. The 
different composite scores, based on the various functions, were then combined into 
predictions. A four-way classification of predictions was used, with each child 
allocated a predictive rating: 


+ + =severe risk of reading failure 
+  -mild risk of reading failure 
—  spredicted average reader 

— — =predicted superior reader 


Criterion reading measures 

At the end of Year 3 (November, 1977), individual reading measures were 
obtained on 81:6 per cent of the original sample (332/407). The Neale Analysis of 
Reading (Neale, 1966) (reading comprehension) and the St. Lucia Reading Test 
(Andrews, 1969) (word recognition) were the criterion reading measures used. On the 
basis of the comprehension criterion the children were divided into the following four 
reading groups: Severe (N=56), Mild (N=68), Average (N=155), and Superior 
(N53). The Severe group was reading approximately 18 months behind age level 
whereas the Superior group was reading 25 months ahead of age level. 


RESULTS 
Classification 
Classification was examined by comparing the composite discriminant function 
predictions from Year 1 (1975) against the reading outcomes at the end of Year 3 
(1977). The results are presented in Table 1 (2 x4 matrix) where the composite test 


TABLE 1 


PREDICTIVE CLASSIFICATION OF CHILDREN INTO READING GROUPS (YEAR IIT) 
BASED ON ABBREVIATED TEST BATTERY (YEAR I) 





Composite 
Predictions Reading Groups (Comprehension Criterion) 
Severe Mild Average Superior 
+ N 49 40 52 3 
% (88) (59) (33) (6) 
- N 7 28 103 50 
% : 02 (41) (67) (94) 





T 56 68 155 53 
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| predictions (rows) are reduced to high risk (Severe and Mild) and low risk (Average 
and Superior) and where the reading outcomes are distributed as columns (Severe, 
Mild, Average, Superior). 


Inspection of this table reveals that the tests correctly predicted 88 per cent 
(М =49) of the Severe group and 94 рег cent (N = 50) of the Superior group, while 
misclassifying 41 per cent (N =28) of the Mild group and 34 per cent (N =52) of the 
Average group. The overall prediction was 72:9 per cent (242/332). 


TABLE 2 


PROBABILITY OF DECISION RISK. ASSOCIATED WITH COMPOSITE TEST PREDICTIONS 
(YEAR 1~YeEAR 111) 


Reading Groups (Comprehension Criterion) 











Composite Ratio 
Predictions Decision Severe Mild Average Superior Correct Proportion 
++ HR 33 13 15 0 46/61 0-75 
+ HR 16 27 37 3 43/83 0-52 
- LR 3 17 51 10 61/83 0-75 
-- LR 4 1 52 40 92/107 0-86 





Total 56 68 155 53 


The classification outcomes for a 4 x 4 matrix are presented in Table 2. In this 
table the discriminant composite predictions are reduced to four levels (rows) which 
enables one to compute the conditional probability for each of the composite test 
signs. This table reveals the outcomes for the severe high risk predictions (+ +) 
which would form the decisional basis for intervention programmes (experimental and 
control) in Year 1. Inspection of this table shows that the probability that a child 
would be high risk (HR), given a severe high risk composite score in Year 1 
(P(HR/ + +)) was -75. А much lower predictive outcome was revealed for the mild 
high risk composite score (Р(НВ/ + —-52)). On this composite score, the error rate 
(false positives) is equivalent to the true level of prediction. 


The predictive accuracy was also quite high for both low risk composite test 
scores in Year 1 (P(LR/ — =-75)) and (P(LR/ — - =-86)). Therefore, had a decision 
been made to withhold treatment for children who revealed these low risk composite 
scores, the decision would have been correct in approximately four out of five cases. 


Similar results occurred when word-recognition was used as the criterion measure 
—the tests correctly predicted 86 per cent of the Severe group and 90 per cent of the 
Superior group. Misclassification occurred in 33 per cent of the Mild group and 32 
per cent of the Average group. The overall hit rate was slightly higher with word 
recognition as the criterion measure—74-] per cent. The conditional probabilities for 
each of the composite test signs were similar also. The ratios of correct predictions 
for each category (based on word-recognition scores at the end of third grade) follow, 
with ratios based on comprehension scores given in parentheses after each: 


Decision to intervene based on severe risk sign *68 (75) 
Decision to intervene based on mild risk sign 51 33 
Decision not to intervene based on average sign *80 (-75) 
Decision not to intervene based on superior sign -89 (-86) 


Predictive ranking 
Ranking of the predictive tests, according to the variance for which they account, 
has varied slightly in the studies of Satz and colleagues, related to differing ages of 
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applying criterion measures, different criterion measures and different populations. It 
is important to remember that very slight differences in variance—as low as two or 
three per cent—can alter ranked position, and as correlations between tests highly 
weighted with the same factors are high, there is relatively little significance, for 
example, in whether Recognition-Discrimination is ranked higher than Finger- 
Localisation, or vice versa. It is more important to note whether factors are ranked 
differently and whether or not all factors are represented in the higher rankings. 


The ranking of the first four predictive tests when reading comprehension was the 
criterion measure was Peabody Picture Vocabulary Test (Factor IT), Recognition- 
Discrimination (Factor Т), Alphabet Recitation (Factor ПТ) and Socio-economic 
status (ІП). The ranking using the word-recognition criterion was Alphabet Recita- 
tion (ПО, Recognition-Discrimination (I) Dichotic Digit Recall (ID, Finger- 
Localisation (Т). 


It will be noted that all three factors were represented in the highest ranking 
predictors of both word-recognition and comprehension abilities at the end of third 


grade. 


DISCUSSION 


The preceding results provide additional cross-cultural support for the predictive 
efficiency of the abbreviated screening battery. The tests given at the beginning of 
first grade correctly predicted the achievement outcomes of a sample of Australian 
boys at the end of third grade—particularly those destined to extremes in the reading 
achievement distribution. With comprehension as the criterion measure, 88 per cent 
of the severely retarded readers and 94 per cent of the superior readers were correctly 
identified. These predictions held up in spite of the fact that a significant percentage 
of the children (30 per cent in the total Catholic system, up to 80 per cent in individual 
schools) came from a bi-lingual home, i.e. a home in which a language other than 
English was spoken. Secondly, there was no attempt made to restrict remedial inter- 
ventions within the schools. A third factor which could have been expected to reduce 
the predictive validity of the tests under Australian conditions is the fact that only a 
minority of Australian children have pre-school or kindergarten experience, while the 
majority go directly into first grade from the home. Each of the preceding factors 
would tend to inflate spuriously the false positive rate. 


Of the 55 children who were incorrectly classified as being ‘ at risk’ (Table 1), 
30 (55 per cent) had received pre-school education. A further study is indicated to 
evaluate to what extent pre-school experience distorts the predictive accuracy of the 
abbreviated test battery. 


Despite the high number of false positives in Table 1, it was shown that this 
misclassification error could be reduced when predictive decisions (conditional 
probabilities) were based only on children who evidenced severe high risk signs 
(P(HR/ + +) = 75). This more conservative decision strategy (Table 2), which would 
screen in a smaller number of children, would nevertheless maximise the valid positive 
rate while minimising the false positive rate. In contrast, decisions to intervene on less 
severe high risk signs (PCHR/ +) = ‹52) would have produced too many false positive 
misclassifications. On this basis, only the more conservative intervention strategies 
would be recommended with this abbreviated test battery. 


The change in the predictive rankings according to the criterion measure used is 
of interest even though differing rank order may represent only a very slight difference 
in the amount of variance accounted for. The previous studies of Satz and colleagues 
have consistently reported a predominance of Factor I (sensori-motor-perceptual) 
tests in the high-ranking predictors. The present study again confirmed this finding 
when word-recognition was the criterion measure. Factor I tests did not predominate 
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when reading comprehension was the criterion measure. One possible explanation for 
this finding in the Australian study is that the population contained a high percentage 
of bi-lingual homes (30 per cent)—a condition more likely to affect reading com- 
prehension than word-recognition skills. 
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EFFECT OF PICTURES ON LEARNING TO READ 
COMMON NOUNS 


By R. J. LANG AND R. T. SOLMAN* 
(Department of Psychology, University of New South Wales) 


SuMMARY. Three experiments are reported to investigate the effects of pictures on 
learning to read common nouns with samples of kindergarten children. Comparisons 
were made between three conditions: the absence of a picture, the presence of a related 
picture; and the presence of an unrelated picture. Using large projected images, no 
differences were detected. To check whether this result was attributable to the method 
of presentation, the experirnent was repeated using word cards, with similar results. 
Finally, besides further manipulation of the spacing effects, a third experiment investi- 
gated the effects of drawing children’s attention to the link between word and picture. 
It was concluded that pictorial information could be used with advantage, if the child 
was aware of the relationship between word and picture, and that the use of spacing 
did appear to facilitate learning (as a post-hoc analysis). 


INTRODUCTION 


STUDIES conducted by Braun (1969), Harris (1967), Samuels (1967), and more recently 
by Harzem et al. (1976), have done much to discredit the assumption that learning is 
facilitated when the word to be learnt is presented simultaneously with a picture of the 
object named by that word. Indeed, when commenting upon their own results, 
Harzem et al. (1976) concluded that “ what is to be avoided is a direct equivalence 
between the picture and the printed text, especially in the case of the presentation of 
single words ” (р. 322). 


In view of the long-standing pedagogic practice of presenting beginning readers 
with words in conjunction with their pictorial representation, it is surprising that 
empirical examination suggests that word learning is not facilitated and may even be 
inhibited (e.g., Harzem et al., 1976). Mental imagery is an effective method of 
facilitating associative learning (e.g., Bower, 1972) and the ‘levels of processing ' 
analysis of memory (Craik and Lockhart, 1972) suggests that elaboration and/or depth 
of processing aids recall. Why then does the post-learning performance of young 
children fail to reflect an advantage of pictorial elaboration in the case of visually 
presented words? One possibility is that when a picture is presented simultaneously 
with a printed word, the child attends mainly to the picture and as a consequence of 
this uneven division of attention, post-learning performance for the word compares 
unfavourably with similar performance in a condition where the word is presented 
alone. This suggestion has received some empirical support (Duell, 1968), and it is 
examined further in the studies reported here. 


METHOD 


The task was word acquisition. Beginning readers were asked to read words 
during a three-phase experimental session. - These sessions were conducted on a one- 
to-one basis, and a set of 28 words was first presented to establish a pre-test score, 
then presented during à word acquisition phase, and finally presented to obtain a 
post-test score. 

The children taking part in the experiments were selected from kindergarten 
classes in two Sydney schools. This selection was random, with the constraint that 
there be equal numbers of boys and girls, and with the exclusion of any child having 
significant physical, socia] or emotional handicaps, or obvious learning disabilities. 
Boys and girls were allocated randomly to experimental groups, and the members of 
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each group were systematically assigned to experimental sessions which spanned both 
the full testing period and the full testing day. The spread of group members over the 
full testing period controlled for the effects of learning outside the experiment, and 
the spread over a testing day controlled for the effects of fatigue. 


The bulk of the 28 stimulus words were common nouns selected from the 
Schonell RI (1974) and ACER Individual Reading Test (Allen, 1974), and the 
remainder were words deemed appropriate by experienced teachers in the schools. The 
words were considered suitable for sight recognition purposes because they referred 
to objects familiar to young children, and they could be represented in colour, in 
pictorial form, in a simple, unambiguous manner. 


EXPERIMENT 1 


In the previous studies already mentioned, when children were required to learn a 
restricted sight vocabulary by means of a look-say method, it was found that if words 
and pictures were presented simultaneously, pictures interfered with learning, perhaps 
as a consequence of an uneven division of attention. Specifically, the simultaneous 
presentation of word and picture may have temporarily overloaded the child’s 
information processing system (e.g., see Kahneman, 1973), and since the pictorial 
information probably proved to be more salient it was processed first. The stimulus 
presentation time used in these studies was considerably longer than the time required 
for an eye movement (approximately 250 msec), and the children were therefore at 
liberty to redirect attention to the word or to switch attention between the word and 
the picture. In practice, however, it seems that they concentrate on processing the 
picture rather than redirecting attention. Specifically, Duell (1968) found that 72 per 
cent of her beginning readers did not spontaneously attend to the words. 


This study was designed as a test of the hypothesis of ‘ divided attention’. 
Performance under conditions of simultaneous presentation of words and pictures 
was compared with performance in conditions where the picture followed presentation 
of the word after an interval of one second, and it was suggested that children in these 
latter conditions would attend to the word in the first instance, and as a consequence 
they would learn more words and out-perform children in the simultaneous conditions. 
In addition, and in an attempt to partially replicate the study conducted by Harzem 
et al. (1976), separate groups of children were presented (a) with the word alone, 
(b) with the word and a related picture, and (c) with the word and an unrelated picture. 
Harzem et al. (1976) found that the condition in which the word was presented alone 
was the most conducive for learning, and that the presence of a related picture 
depressed performance more than the presence of an unrelated picture. 


Method 

Sixty children were selected randomly from five kindergarten classes (168 pupils) 
in a large metropolitan government primary school (total enrolment, 1050). The 
selected children were randomly allocated to six experimental groups, with the 
restriction that each group should contain five boys and five girls. Each child took 
part in a single experimental session of 20 to 50 minutes' duration. This experimental 
session consisted of an adjustment period, a recognition test (pre-test) of the 28 
stimulus words, familiarisation with the pictures (for children in conditions using 
pictures), an acquisition phase, and a further recognition test (post-test) of the words. 
Three groups of children participated in experimental conditions where pictures (when 
used) were presented simultaneously with words, and the remaining three groups 
participated in conditions where pictures (when used) followed words after an interval 
of one second. Within these simultaneous and spaced conditions, one group of 
children learnt the words in the absence of pictures (for the spaced condition the 
presentation of the word was followed after one second by a blank slide), a second 
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group was presented with words and unrelated pictures (unrelated pictures were 
selected at random from the 28 pictures minus the one representing the word to be 
learnt), and a third group learnt the words in the presence of their pictorial repre- 
sentations. 


Two Hanimex manually operated projectors (Syllabus 4000, 100 lens) were used 
to project slides of words and/or pictures onto a standard Hanimex Lentolux Screen 
(1:52 x 1-52 metres). Words were located on the bottom one-third of the slides and 
pictures occupied the top two-thirds. To cater for the specified experimental con- 
ditions, 28 slides were prepared with the word in the lower third and the remainder 
blank, and 28 were prepared with the picture in the upper two-thirds and the remainder 
blank. In addition, 28 composite slides of words and related pictures were also 
prepared. Two slide projectors were available for use in the experiment. Only one 
projector was required for the simultaneous-word-alone condition (presentation of the 
* word ' slides), and for the simultaneous-word-related-picture condition (presentation 
of ‘ composite’ slides). For all other conditions it was necessary to use both pro- 
jectors as the spaced-word-alone condition required the presentation of a blank slide 
one second after the arrival of the word, and the remaining three conditions required 
the superimposition (after an interval of either zero seconds or one second) of a picture 
above the word. 


The child on arrival at the experimental session was introduced to the experi- 
menter, who established rapport, and then administered the recognition pre-test. 
During this testing period the words were presented alone and one at a time upon the 
screen, and the child was given up to 20 seconds to respond with the name of the word. 
The experimenter did not comment on failures to respond, nor did he correct any 
errors. Instead he attempted to maintain motivation and encourage persistence on 
the task with comments such as “ Good try, you're doing well " and “ Yov're trying 
very hard, I’m very pleased with you ?. At the completion of pre-testing the experi- 
menter recorded the number of correct responses, familiarised the child with the 28 
pictures (if allocated to a condition using pictures) and prepared the stimulus materials 
for use during word acquisition. During this phase of the session a correct response 
was supplied immediately following each slide presentation, the child being told “ Now 
РИ show you the words again, but this time if you don’t know а word, ГИ tell you 
what itis. Are you ready?" When each word was presented, the experimenter said 
clearly * The word is . . ." (or, if the child responded correctly, “ Yes, the word 
1s..."). The experimenter took care not to refer to the pictures during this phase. 
'The procedure for the recognition test which followed the word acquisition phase 
followed that for the pre-test, and the number of correct responses was again recorded. 


Results 

The correct response data were averaged for each experimental condition, and 
the obtained mean number of correct word-naming responses for both the pre-test 
and the post-test are shown in Table 1. The scores obtained on the post-test were 


TABLE 1 


OBTAINED MEAN NUMBER OF CORRECT WORD-NAMING RESPONSES FOR PRE-TEST 
AND POST-TEST IN EACH EXPERIMENTAL CONDITION (EXPERIMENT 1) 


Experimental Condition 


Mode of Word Alone Related Picture Unrelated Picture 
Presentation Pre-Test Post-Test Pre-Test Post-Test Pre-Test Post-Test 
Simultaneous 5-20 7-00 3:50 5:10 1:60 2:60 


Spaced 4:10 6°50 3:70 6:30 3:10 540 
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adjusted for the initial inter-group differences observed on the pre-test (i.e., pre-test 
scores were allowed to co-vary with post-test scores), and the adjusted values are 
illustrated in Figure 1, which shows the percentage of correctly recognised words for 
the six conditions formed by combining the factors simultaneous-spaced and alone- 
related-unrelated. Observation of Figure 1 suggested firstly that the average number 
of words learnt did not differ for the conditions word-alone, word-related-picture and 
word-unrelated-picture, and secondly, that the spaced mode of presentation improved 
performance. An analysis of covariance supported the former observation, that is, 
the differences between the alone, related and unrelated conditions were not significant 
(F(2,53) 20-14, P>0-25). However, the suggested superiority for the spaced con- 
ditions approached but did not reach statistical significance (F(1,53) «3-44, 0-10 P > 
0-05). The results of this analysis also indicated that there was no interaction between 
these two factors (F(2,53) 20-07, P 0-25). 


FIGURE 1 


Tse PERCENTAGES OF CORRECT WoRD-NAMING RESPONSES WITH РО5Т-ТЕ8Т SCORES ADJUSTED FOR 
INITIAL INTER-GROUP DIFFERENCES OBSERVED ON THE PRE-TEST, FOR THE COMBINATION OF THE ALONE. 
RELATED, AND UNRELATED CONDITIONS WITH THE Two MODES OF PRESENTATION 
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Discussion 

The children in this study found the difficulty of the word-acquisition task to be 
independent of whether the words were presented alone, were accompanied by their 
pictorial representation, or were accompanied by an unrelated picture, that is, there 
were no differences between the adjusted post-test scores in these three conditions 
(Е(2,53) =0-14, Р.> 0-25). This consistent performance for alone, related and un- 
related conditions was not influenced by the method of presenting the word and the 
picture, that is, there was no interaction between the two factors simultaneous-spaced 
and alone-related-unrelated (F(2,53) «0-07, P> 0-25). This finding contrasts with the 
results of previous studies which indicate that learning single words is inhibited by the 
presence of a related picture (e.g., Harzem et al., 1976; Samuels, 1967). Itis not clear 
why the presence of the related pictures did not inhibit performance in this study, but 
it is possible that the relatively large size of the word (which occupied the bottom 
one-third of the projector slide) and/or the large size, per se, of its projected image, 
attracted the child's attention. In other words, the children in this study, rather than 
attending mainly to the picture appear to have processed the word in sufficient detail 
for their performance in the related condition to match that achieved in the alone 
condition. 


The adjusted post-test naming responses (see Figure 1) suggested that the spaced 
mode of presentation facilitated learning. This is a potentially confusing result: for 
example there are no obvious reasons why learning in the alone condition, where the 
presentation of the word was followed by the superimposition of a blank slide, should 
be superior to learning in the alone condition where the blank slide was absent. 
However, since the observed differences failed to reach statistical significance (F(1,53) 
=3-44, 0-102 P 7 0-05), their explanation must be contingent upon further empirical 
work. 


EXPERIMENT 2 

Introduction 

The present study was designed to investigate the above suggestion that the large 
size of the projected images had affected the results. Children were again asked to 
learn words presented alone or in the presence of either a related or unrelated picture, 
and these pictures were again presented at the same time or one second after the 
presentation of the word. However, the large projected images were replaced by more 
familiar word-cards. 


Method 

Seventy subjects were randomly selected from four kindergarten classes (135 
pupils) in a medium-sized metropolitan government primary school (total enrolment 
652). Selected children were randomly allocated to five experimental groups, with the 
restriction that each group should contain seven boys and seven girls. Each child took 
part in a single experimental session of 21 to 37 minutes' duration, and as previously 
detailed, these sessions were made up of an adjustment period, a recognition test 
(pre-test) of the words, familiarisation with the pictures (for those children not in the 
word-alone condition), an acquisition phase, and a further recognition test (post-test). 
Two groups of children participated in experimental conditions where pictures were 
presented simultaneously with words, two groups participated in conditions where 
pictures followed words after an interval of one second, and the remaining group 
learnt words presented alone. Within the simultaneous and spaced conditions, one 
group of children was presented with words and unrelated pictures (unrelated pictures 
were again selected at random from the 28 pictures minus the one representing the 
word to be learnt), and the other group was presented with words in the presence of 
their pictorial representations. 
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Cards were used for the presentation of the words and the pictures. The stimulus 
words were handprinted on white cardboard sheets (25:50 cm x 13:50 cm) in lower case 
letters, and the pictures of the ‘ things’ referred to by the words were pasted on 
cardboard sheets twice the size of the word cards (27:00 cm x 25-50 cm). This resulted 
in a size ratio of word to picture of one to two, the same as that used in the first 
experiment. The sets of word and picture cards were each numbered on the reverse 
side to facilitate selection and manipulation during the various phases of the experi- 
ment. Specifically, word cards were held and shown to the children in the word-alone 
condition, and in the picture conditions the experimenter held the picture card above 
the word card. In these latter conditions word and picture cards were either held 
together and presented as a composite (i.e., simultaneous conditions), or the word card 
was presented first and after an interval of one second the picture card was held above 
it (1.е., spaced conditions). The experimental procedure was otherwise as described 
for Experiment 1. 


Results 

The correct response data were averaged for each experimental condition, and the 
obtained mean number of correct word-naming responses for both the pretest and the 
post-test are shown in Table 2. The scores obtained on the post-test were adjusted 
for the initial inter-group differences observed on the pre-test in the manner previously 


TABLE 2 


OBTAINED MEAN NUMBER OF CORRECT WORD-NAMING RESPONSES FOR PRE-TEST 
AND POST-TEST IN EACH EXPERIMENTAL CONDITION (EXPERIMENT 2) 


Experimental Condition 








Mode of Word Alone Related Picture Unrelated Picture 
Presentation Pre-Test Post-Test Pre-Test Post-Test Pre-Test Post-Test 
Simultaneous 1:86 3-86 5:07 7-93 4-14 6:29 
Space — - 5:00 9-50 5-64 8-64 


described in Experiment 1. That is, pre-test scores were allowed to co-vary with post- 
test scores. Adjusted post-test scores are illustrated in Figure 2, which shows the 
percentage of correctly recognised cards for the five experimental conditions. Obser- 
vation of Figure 2 suggested that while there may not have been a main effect of either 
spacing or type of picture, children in the spaced-related-picture condition appear to 
have recognised more words than children in the other conditions. However, an 
analysis of covariance did not indicate that performance was superior in the spaced- 
related-picture condition. That is, the interaction between the factors of spacing and 
type of picture failed to reach significance (F(1,64) = 1-02, P> 0-25), and in addition, 
a Dunnett’s test which compared average performance in the spaced-related-picture 
condition with average performance in the word-alone (control) condition did not 
detect a difference (1(64) = 1-83, two-tailed critical value=2°51, Р> 0:05). On the 
other hand, the observation of no main effect for both spacing and type of picture 
was supported by the analysis, in that the differences between both the spaced and 
simultaneous conditions and between the related-picture and unrelated-picture 
conditions failed to reach significance (F(1,64) «2-79, P» 0:05; and F(1,64) =2:96, 
P 70:05 respectively). 


Discussion 

The data obtained in this study did not suggest that the pictures inhibited the 
process of word learning, and a planned comparison of the word-alone (control) 
condition with the average of the four picture conditions failed to detect a difference 
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FIGURE 2 
THE PERCENTAGE OF CORRECT WoRD-NAMING RESPONSES WITH Post-Test SCORES ADJUSTED FOR 
INITIAL INTER-GROUP DIFFERENCES OBSERVED ON THE PRE-TEST, FOR THE WORD-ALONE CONDITION 


AND THE COMBINATION OF RELATED AND UNRELATED PICTURE CONDITIONS WITH THE Two MODES or 
PRESENTATION 
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(F(1,64) =0-15, Р>0%25). This finding is consistent with the results obtained in 
Experiment 1, that is, the children in both studies found the difficulty of the word- 
acquisition task to be independent of whether the words were presented alone, were 
accompanied by their pictorial representation or were accompanied by an unrelated 
picture, and as a consequence, it also contrasts with the results of previous studies. 
It was suggested in the discussion of Experiment 1, that it may have been possible to 
account for these different findings by considering the attention capturing capacity of 
large projected images. This explanation is no longer tenable. How then can we 
account for the absence of an inhibiting effect in our experiments? Perhaps the 
different results are a consequence of different experimental procedures. Specifically, 
children taking part in the earlier experiments were given repeated sets of learning and 
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testing trials. In contrast, our subjects attended only one set of pre-test, learning and 
post-test trials, and it is therefore possible that differences similar to those reported 
previously may have emerged if we had adopted a similar repeated learning and 
testing procedure. 


The differences between the adjusted post-test naming responses obtained in the 
spaced conditions and those obtained in the simultaneous conditions did not reach 
significance (F(1,64) =2:79, P > 0-05), and, in addition, the interaction between spacing 
and type-of-picture also failed to reach significance (F(1,64) 2 1-02, Р> 0:25). These 
results are consistent with the finding obtained in Experiment 1, which showed that 
the observed superiority favouring the spaced conditions of presentation (see Figure 1) 
was statistically insignificant (F(1,53) 23-44, 0-10>P>0-05), and it is therefore 
tempting to conclude that since presenting the word prior to the picture does not result 
in significantly improved performance, the divided attention hypothesis should be 
abandoned as an explanation of the negative effects of pictures. This conclusion may 
however be premature. The insignificant differences which did occur consistently 
favoured the spaced mode of presentation, particularly in the conditions using related 
pictures (see Figures 1 and 2), and in addition, since pictures did not inhibit per- 
formance in these experiments it is possible to argue that there was insufficient 
opportunity for the effects of spacing to emerge. It is therefore suggested that the 
possible effects of spacing warrant further empirical examination. 


It should be noted at this stage that although there was no trend in the data 
obtained in either Experiment 1 or Experiment 2 to suggest that pictures necessarily 
inhibit performance, the experimental manipulations failed to produce evidence in 
support of pictorial elaboration as an aid to learning. 


EXPERIMENT 3 

Introduction 

Spacing between the presentation of the word and the presentation of the picture 
was again manipulated in this study, and in addition, we examined the possibility that 
drawing the child's attention to the fact that the word and the picture were related (in 
that they represented the same thing), would facilitate effective use of the pictorial 
information. That is, studies of the use of imagery as an aid to learning have shown 
that children younger than seven years experience difficulty in associating visual 
representations of concrete words. They can, however, benefit from the use of 
imagery if three-dimensional representations of the words are used in demonstrations 
or are actually manipulated by them (Varley ег al., 1974; Wolff and Levin, 1972). In 
the present context, this suggests that it cannot be assumed that the child necessarily 
associates the two visual inputs, word and picture. In other words, it may be the case 
that the child processes the word and the picture as unrelated inputs, unless positive 
intervention ensures that he/she consciously associates the two as representations of 
the same thing. 


Method 

Sixty-four subjects were randomly selected from three kindergarten classes (102 
pupils) in a medium-sized metropolitan government primary school (total enrolment 
525). Selected children were randomly allocated to four experimental groups, with 
the restriction that each should contain 8 boys and 8 girls. Each child took part in a 
single experimental session of 23 to 44 minutes’ duration, and; as previously detailed 
in Experiments 1 and 2, this session comprised an adjustment period, a recognition 
test (pre-test) of the 28 stimulus words, familiarisation with the pictures, an acquisition 
phase, and a further recognition test (post-test) of the given words. Words and related 
pictures were again presented either simultaneously or with the word followed after 
one second by a superimposed picture. No unrelated pictures were used and the 
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word-alone condition was absent from this design. Two groups of children were 
allocated to each of the simultaneous and spaced conditions, and, during the acquisi- 
tion phase, one group in each condition received instructions which emphasised the 
fact that the picture was a visual representation of the object named by the word. 


The materials and apparatus were those described for use in Experiment 1. That 
is, as the results of Experiment 2 had failed to produce evidence to indicate that the 
effects of the experimental manipulations were dependent upon whether the words and 
pictures were viewed on a screen or cards, projected images were again used. Com- 
posite word-picture slides were projected from a single projector in the simultaneous 
conditions, and two slides, a word from the first projector followed by a related 
picture from the second projector, were projected in the spaced conditions. The 
experimental procedure was otherwise equivalent to the previous experiments. 


Results 

The correct response data were again averaged for each experimental condition, 
and the obtained mean number of correct word-naming responses for both the pre-test 
and the post-test are shown in Table 3. Scores obtained on the post-test were adjusted 
for the initial inter-group differences observed on the pre-test, and the adjusted values 
are illustrated in Figure 3, which shows the percentage of correctly recognised words 
for the four conditions formed by combining the factors simultaneous-spaced and 
association-no association. 


TABLE 3 


OBTAINED MEAN NUMBER OF CORRECT WORD-NAMING RESPONSES FOR PRE-TEST 
AND POST-TEST IN EACH EXPERIMENTAL CONDITION (EXPERIMENT 3) 


Experimental Condition 


Mode of Association No Association 
Presentation Pre-Test Post-Test Pre-Test Post-Test 
Simultaneous 5:31 8-81 7-15 10-31 
Spaced 4:13 7:38 4:31 6:38 


Observation of Figure 3 suggested, firstly, that while performance levels did not 
differ overall for the simultaneous and spaced conditions, children in the spaced-no 
association condition performed a little better than the corresponding group of 
children in the simultaneous-no association condition, and secondly, that learning was 
facilitated by drawing the child’s attention to the fact that the picture was a repre- 
sentation of the word. An analysis of covariance supported the suggestion that 
simultaneous and spaced conditions did not differ (F(1,59)=0-36, Р> 0:25), and 
confirmed the observation that children in the association conditions made signifi- 
cantly more correct post-test recognitions responses than children in the no association 
conditions (F(1,59) «6-15, P«0-05). There was, however, no suggestion that the 
effects of spacing and nature-of-association interacted (Е(1,59) 20:20, Р>0-25), а 
result which indicated that the small observed difference between the spaced-no 
association and simultaneous-no association conditions was not significant. 


Discussion 

Children in this study found the difficulty of the word-acquisition task to be 
unaffected by the method used to present the word and the picture. Specifically, there 
was no suggestion from the data (see Figure 3) that spacing per se facilitated learning 
(F(1,59) 20:36, Р> 0:25). There was, however, an observed but insignificant differ- 
ence between spaced-no association and simultaneous-no association conditions. This 
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FIGURE 3 


THE PERCENTAGES OF CORRECT Worp-NAMING RESPONSES WITH POST-TEST SCORES ADJUSTED FOR 
INITIAL INrER-GROUP DIFFERENCES OBSERVED ON THE PRE-TEST, FOR THE COMBINATION OF ASSOCIATION 
AND NO ASSOCIATION CONDITIONS WITH THE Two MODES OF PRESENTATION 
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difference suggested that, provided no reference was made to the fact that both the 
word and the picture represented the same thing, children who viewed the related 
picture one second after they had viewed the word, recognised more words than 
children who viewed the picture and the word at the same time. This obtained 
insignificant difference was consistent with similar tendencies observed in the data 
reported in Experiments 1 and 2, and will be discussed in detail later. 


Performance was facilitated in those experimental conditions where the experi- 
menter ensured that the child attended to the fact that the picture was a representation 
of the visually presented word (F(1,59) = 6:15, Р< 0.05). This result was similar to a 
previous finding reported by Duell (1968), who found that children forced to * notice ' 
the printed word during training performed better than children who were trained in 
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a condition where a naming response could be made on the basis of the pictorial 
information alone. She interpreted this finding in terms of a switch of attention from 
the picture to the word. However, it is likely that the training sequence used not only 
forced the child to process the word, but more importantly, encouraged the processing 
of word and picture as related inputs. The superior performance found in the 
association conditions was also consistent with the findings of Varley et al. (1974), and 
Wolff and Levin (1972), who showed that children younger than seven years were 
capable of using visual imagery as an aid to learning only when they were given 
explicit details of the integrated image to be formed. 


GENERAL DISCUSSION 


The three studies reported in this paper examined the acquisition of single words 
in the presence of pictorial information, and the results obtained are relevant to 
facilitating early reading development in schools. 


Effects of spacing. Yn all three experiments post-test recognition scores suggested 
that the learning of single words was facilitated, in at least some circumstances, when 
their presentation to the child was followed after one second by the presentation of a 
picture. Analyses of the data in each of these experiments indicated that the observed 
differences failed to reach statistical significance, but it is possible to make relevant 
comparisons over the three experiments (see Cochran and Cox, 1957, Ch. 14). 
Specifically, in all three studies, recognition scores were recorded for a group of 
children who viewed a word followed by its pictorial representation and for a similar 
group who viewed the two together. These data were subjected to an experiment by 
spacing analysis of covariance, and the results indicated that the children in the spaced 
conditions did in fact learn more words than those in the simultaneous conditions 
(F(1,73) =4-97, P «0-05). There was also a significant difference between average 
performance in the three experiments (F(2,73) = 5:24, P «0-01), but as this effect did 
not interact significantly with the effect of spacing (F(2,73) =0-63, P « 0-25) it was not 
considered relevant to the present discussion. 


Analysis of data from the three experiments showed that children who viewed the 
word followed by its pictorial representation, recognised more words during testing 
than those who viewed the two together (1.е., there was a significant effect of spacing). 
The size of the advantage for the spaced presentations was, however, small (1.е., the 
adjusted means for simultaneous and spaced conditions were 7-32 and 8-13 respect- 
ively), and the statistical significance was obtained after post hoc analyses. The finding 
should, therefore, be interpreted with caution. However, it does support the sugges- 
tion that when pictures inhibit the learning of words it is because they distract the 
child from the primary task of encoding the word, and in addition, since pre-test and 
post-test differences were generally small in these studies (see Tables 1, 2 and 3), 
spacing could prove to be of importance in situations where children learn more 
words. Consequently, if we contrast this small positive effect with the negative effects 
of pictures reported by Harzem et al. (1976) and Samuels (1967), teachers can be 
conservatively advised that the use, in the classroom, of a spaced mode of presentation 
is unlikely to inhibit and may well facilitate the acquisition of single words. 


Effect of association. In Experiment 3, children were able to use the relevant 
pictorial information to advantage provided they were given explicit instructions that 
associated the word and the picture. This result suggested that the failure of young 
children to use related pictures (e.g., Harzem et al., 1976), reflected, at least in part, a 
general difficulty encountered during the formation of integrated visual images 
(Varley et al., 1974; Wolff and Levin, 1972). It follows, therefore, that classroom 
teachers cannot assume that the spatial proximity of the visually presented word and 
related picture ensures that the young beginning reader will associate the two. In fact, 
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teachers should give explicit instructions which indicate to the child that the word and 
the picture represent the same object, and teaching should not proceed until the child 
has responded in a manner which indicates that the two inputs have been correctly 
related. 
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CHILD-REARING ATTITUDES OF MOTHERS ОЕ UNDER-, 
AVERAGE-, AND OVER-ACHIEVING CHILDREN 


By C. N. BANNER* 
(Department of Educational Psychology, University of Calgary) 


Summary. The relationships between child-rearing attitudes of mothers and levels of 
academic achievement of their 11-year-old (grade six) elementary school children were 
examined. The sample included 191 mothers (103 mothers of sons and 88 mothers of 
daughters). Maternal attitudes were assessed by the Parental Attitude Research 
Instrument. The criterion of academic achievement for the children consisted of 
objective achievement test scores in reading plus mathematics. Level of academic 
achievement was defined in terms of the relationship between expected and actual 
achievement using the regression-equation method. Those items of the Parental 
Attitude Research Instrument, which discriminated significantly between mothers of 
under-, average-, and over-achievers, were subjected to factor analysis. The test data 
of the mothers of sons and mothers of daughters were processed separately. Four 
factors were identified for the mothers of sons and also four factors for the mothers of 
daughters. The results indicated that, compared with the mothers of average- and 
over-achievers, the mothers of under-achieving sons are more dominant, rigid, and 
restrictive in the sense of being possessive and intrusive, while the mothers of under- 
achieving daughters are more dominant, rigid and restrictive in terms of being protective. 


INTRODUCTION 


THE objective of this study was the examination of the relationships between the 
child-rearing attitudes of mothers and the academic achievement levels of their 
11-year-old (grade six) elementary school children. The important components of 
this study, compared with earlier research on this subject, pertain mainly to three 
crucial methodologicalissues. First, the level of academic achievement of the children 
was not defined simply as a grade or score of a certain absolute magnitude, but in 
terms of the relationship between each child's actual academic achievement and the 
achievement expected on the basis of his measured intelligence, using the regression- 
equation method. Second, the criteria of academic achievement consisted of objective 
achievement test scores in reading plus mathematics, instead of the frequently used 
grades on report cards. Third, the test data of the mothers of sons, and the mothers 
of daughters, were processed separately, on the grounds that maternal attitudes which 
are associated with under-, average- or over-achievement of children may not 
necessarily be the same in relation to boys and girls. Sex as a significant differential 
variable in the relationship between personality and academic achievement in children 
has been clearly demonstrated by, among others, Entwistle and Cunningham (1968). 


Martin (1975), in his comprehensive review of research on parent-child relations, 
points out that mothers of high-achieving boys reported that they had expected various 
behaviours indicative of both independence and achievement at a younger age and 
also that they had given larger and more frequent reinforcements when their sons 
performed independently and successfully. For females, intellectual effort was 
associated with low levels of maternal protectiveness. He also mentions that high 
achievement may be related to relatively low levels of acceptance by parents, especially 
for girls. In other words, it may not be necessary for mothers to be nurturant in 
order to be effective models of achievement behaviour. The fact that independence 
and free self-expression are related positively to high achievement, while punishment 
and lack of reasoning are related negatively to academic achievement, was also found 
by Barton et al. (1974). Davids and Hainsworth (1967) reported a study which 
showed that under-achievers perceived their mothers as significantly higher on control 


* Name changed from K. M. Banreti-Fuchs. 
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than high-achievers, although no differences in amount of control between mothers of 
high- and low-achievers were found as perceived by the mothers themselves. This is 
an intriguing finding since it is not clear whether the reported discrepancy in perception 
of degree of maternal control is a misperception on the part of the mothers or on the 
part of their children. Heilbrun and Walters (1968) demonstrated that high maternal 
control combined with low nurturance was, indeed, associated with under-achievement, 
but high maternal control combined with high nurturance was associated with high 
achievement. Hilliard and Roth (1969) found that mothers of achievers were generally 
more accepting of their children than mothers of under-achievers. 


Thus, past research evidence seems to indicate, although not entirely un- 
ambiguously, that maternal attitudes fostering independence and acceptance of the 
child are associated with high levels of academic achievement, while protective 
attitudes limiting independence are related to low achievement. Further research, 
analysing in somewhat greater detail the important maternal attitudes related to 
academic achievement in children would be highly desirable. The present factor 
analytic study is meant to be a contribution to that goal. 


METHOD 


A sample of 191 mothers of grade six elementary school children was chosen, 
representing 103 mothers of boys and 88 mothers of girls. The mothers of the boys 
as well as the mothers of the girls were subdivided, on the basis of their children’s 
academic achievement classification, into three subgroups: (1) Mothers of under- 
achievers, (2) mothers of average-achievers, and (3) mothers of over-achievers. The 
test data of the boys and girls were processed separately, as were the responses of the 
mothers of boys and the mothers of girls. The classification of the children as under-, 
average- or over-achievers was based on the regression-equation method (Guilford, 
1956) using their verbal plus non-verbal raw scores on the Canadian Lorge-Thorndike 
Intelligence Test, Form 1, and their achievement test raw scores in reading plus 
mathematics on the Stanford Reading Test, Intermediate II, Form Y (1964) and the 
Canadian Test of Basic Skills (1970). Those children whose actual achievement test 
scores on reading plus mathematics (RM) were between plus and minus 0:7 standard 
error of estimate from their expected RM scores, were classified as average-achievers 
(A). Children whose actual RM scores were 0-7 standard error of estimate or more 
above their expected RM scores, were classified as over-achievers (O), while children 
whose actual RM scores were 0-7 standard error of estimate or more below their 
expected RM scores, were classified as under-achievers (U). A standard error of 
estimate of 0-7 was used in order to conform to the procedures employed by the author 
in his previous research projects on academic achievement (Banreti-Fuchs, 1972, 1975; 
Banreti-Fuchs and Meadows, 1976). 


All mothers completed the Parental Attitude Research Instrument (PARTI). 
Subsequently, item analysis was carried out on the responses of the three groups, 
namely the mothers of under-achieving boys (MUB), mothers of average-achieving 
.boys (МАВ) and mothers of over-achieving boys (MOB), to the Parental Attitude 
Research Instrument. The same procedure was also followed with the mothers of the 
girls, which group was also subdivided into mothers of under-achieving girls (MUG), 
mothers of average-achieving girls (MAG), and mothers of over-achieving girls 
(MOG). Those items of the PARI which discriminated significantly (P «0-05, two- 
tailed) between the three groups of mothers (MUB, MAB, MOB; and MUG, MAG, 
MOG) were subjected to factor analysis using the principal component solution and 
the varimax rotation method (Kaiser, 1958). In deciding which items to include in 
the interpretation of the factors, a factor loading of +0-40 was chosen as the cut-off 
score. In this way, two sets of factors were identified, one for mothers of boys and 
the other for mothers of girls. 
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RESULTS AND DISCUSSION 
Mothers of boys 
The sample of 103 boys represented 20 under-achievers, 62 average-achievers, and 
21 over-achievers. As expected, the three groups did not differ from each other 
significantly in age or intelligence, but were significantly different in terms of their 
academic achievement levels (Table 1). 


TABLE 1 


MEAN AGE, INTELLIGENCE, AND ACADEMIC ACHIEVEMENT SCORES OF THE UNDER-, 
AVERAGE-, AND OVER-ACHIEVING Boys AND GIRLS 























Under- Average- Over- 
Achievement Achievers Achievers Achievers 
м Boys N=20 N=62 N=21 
Girls N=18 N=48 N=22 
Variables Mean SD Mean SD Mean SD pre 
Age Boys 11-85 0:37 11:82 0-53 11:76 0-44 NS 
Girls 11-78 0-43 11:75 0-53 11:91 0-29 NS 
Intelligence Boys 102-30 1344 100-48 2293 10324 19:32 NS 
Girls 99-17 18:35 10137 2716 10764 21-52 NS 
Academic Boys 49-20 11-77 68:69 16:16 79-29 11:20 0-002 
achievement Girls 5100 11-52 63:83 16471 8000 1212 0-001 


* Analysis of variance. 


Item analysis of the responses of mothers of boys to the Parent Attitude Research 
Instrument revealed that 12 items discriminated significantly between the mothers of 
the under-, average-, and over-achieving boys (Appendix 1). Factor analysis indicated 
four factors which account for 54-7 per cent of the variance (Table 2). 


Factor 1 accounts for 22.6 per cent of the variance and identifies attitudes related 


TABLE 2 


ROTATED FACTOR MATRIX OF ITEMS DISCRIMINATING BETWEEN MOTHERS OF 
UNDER-, AVERAGE-, AND OVER-ACHIEVING SONS 











Factors 

Items Item No. 1 2 3 4 h? 
Children bad 4 03 ‘73 --07 08 54 
Avoid fighting 12 -'45 "48 +19 "38 62 
No waste of time 15 — :01 17 24 57 41 
Complaining 16 12 46 :28 17 33 
Get rid of mischief 27 "08 --07 05 79 63 
Talk about problems 39 "75 119 ~-00 --04 "61 
No nudity 41 "36 30 ‘06 “49 46 
Well-run home 49 58 -:14 04 "41 52 
Mother’s place 72 *60 16 14 09 ‘41 
Mother in charge 88 :20 *69 :03 -:09 53 
Mother should know 89 03 “02 81 23 "71 
Innermost thoughts 112 ‘08 "06 88 04 “80 
Percentage of variance: 22-6 ` 124 111 8-5 


Total variance accounted for: 54-7 per cent 
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to restrictiveness. Mothers of under-achievers, as compared with mothers of average- 
and over-achievers, tend to be more restrictive in terms of allowing their sons to be 
self-assertive, they restrict their sons’ self-expression more, and have a more restrictive 
view of a mother’s proper role definition and activities. 


Factor 2 accounts for 12:4 per cent of the variance and identifies attitudes related 
to dominance. Mothers of under-achieving boys tend to be more punitive, restrict 
more their sons’ self-assertiveness, show a greater lack of acceptance of weakness, and 
are more domineering. 


Factor 3 accounts for 11-1 per cent of the variance and identifies attitudes related 
to intrusiveness. Mothers of under-achievers tend to feel more that they have to know 
absolutely everything about their child, who is not entitled to his privacy. 


Factor 4 accounts for 8-5 per cent of the variance and identifies attitudes related 
to intolerance. Mothers of under-achievers tend to be less tolerant toward their sons’ 
mischievous behaviour and leisure activities, they have a more restrictive view of a 
mother’s proper role and activities, and are less tolerant toward nudity in children. 


It seems, therefore, that the attitudes of mothers of under-achieving boys are 
characterised by a higher degree of restrictiveness, dominance, intrusiveness, and 
intolerance. These results appear to be substantially in agreement with the attitudes 
characteristic of mothers of under-achievers as described by the various researchers 
mentioned in the introductory section. What seems to emerge perhaps even more 
strongly in this factor analysis is the oppressively rigid, possessive, and intolerant 
nature of the attitudes of mothers of under-achieving boys. 


Mothers of girls 

The sample of 88 girls represented 18 under-achievers, 48 average-achievers, and 
22 over-achievers. As expected, the three groups did not differ from each other 
significantly in age or intelligence, but were significantly different in terms of their 
academic achievement levels (Table 1). Item analysis of the responses of the mothers 
of girls to the Parent Attitude Research Instrument revealed that nine items dis- 
criminated significantly between the mothers of the under-, average-, and over- 
achieving girls (Appendix 2). Factor analysis indicated four factors which account 
for 65-0 per cent of the variance (Table 3). 


Factor 1 accounts for 27:0 per cent of the variance and identifies attitudes related 


TABLE 3 


ROTATED FACTOR MATRIX OF ITEMS DISCRIMINATING BETWEEN MOTHERS OF 
UNDER-, AVERAGE-, AND OVER-ACHIEVING DAUGHTERS 











Factors 

Item Item No. 1 2 3 4 h? 
Shelter child 2 20 "80 05 25 ‘TA 
Avoid disappointment 25 *63 16 22 08 48 
Talk about problems 39 “24 ~*64 07 28 56 
Mother strong 42 84 --02 — 14 ~-01 72 
Mild discussion 76 ‘O1 "01 03 "93 -86 
Parents’ thinking 79 “44 ‘Il “62 -:20 “63 
Loyalty to parents 103 38 "38 58 --П "64 
Having one’s way 106 ‘09 22 — 81 -:27 ‘79 
Make up stories 108 60 --1 27 :00 44 
Percentage of variance: 279 147 122 11-2 


Total variance accounted for: 65-0 per cent 
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todominance. Mothers of under-achieving girls tend to be more protective, domineer- 
ing, intolerant toward criticism by their child, and lacking in understanding of the 
child’s emotional problems. 


Factor 2 accounts for 14-7 per cent of the variance and identifies attitudes related 
to protectiveness. Mothers of under-achievers tend to be more protective and 
restrictive of the self-expression of their daughters. 


Factor 3 accounts for 12-2 per cent of the variance and identifies attitudes related 
to authoritarianism. Mothers of under-achievers tend to be more intolerant toward 
criticism from their daughters, they demand more submissiveness, and have a generally 
much less democratic attitude. 


_ Factor 4 accounts for 11-2 per cent of the variance and identifies a rather directive 
attitude, a tendency on the part of mothers of under-achievers to impose their will to a 
greater extent than mothers of average- and over-achieving girls. 


The attitudes of mothers of under-achieving girls may be summarised, therefore, 
by the factors dominance, protectiveness, authoritarianism, and directiveness. These 
four factors appear to be very similar to the factors identified for mothers of under- 
achieving boys. The only major difference seems to be the factor protectiveness 
characterising the attitude of mothers of under-achieving girls. 


In summary, both factor analyses seem to support the idea that the attitudes of 
mothers of under-achievers are characterised by more rigid dominance, intolerance, 
and restriction of independence of their children. However, these maternal attitudes 
appear to manifest themselves somewhat differently in mothers of sons compared with 
mothers of daughters. Mothers of under-achieving sons seem to be more rigid and 
restrictive in the sense of being possessive and intrusive, while mothers of under- 
achieving daughters are more rigid and restrictive in terms of being protective. 
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APPENDIX 1 


PERCENTAGE DISTRIBUTION OF THE RESPONSES TO THE PARI By MOTHERS OF 


UNDER-, AVERAGE-, AND OVER-ACHIEVING SONS 


Item 


Some children are just so bad they must be taught to fear adults for 
their own good. (SD) 
Асым should be taught to avoid fighting no matter what happens. 


There are so many things a child has to learn in life there is no 

excuse for him sitting around with time on his hands. (SA) 

If you let children talk about their troubles they end up complain- 

ing even more. (SD) 

It is frequently necessary to drive the mischief out of a child before 

he will behave. (SD 

Parents who start a child talking about his worries don’t realise 

that sometimes it’s better to just leave well enough alone. (MA) 

It is very important that young boys and girls not be allowed to see 

each other completely undressed. (SD) 

A woman has to choose between having a well-run home and 

hobnobbing around with neighbours and friends. (MA) 

(SD) many women forget that a mother's place is in the home. 
SD 

The whole family does fine if the mother puts her shoulders to the 

wheel and takes charge of things. (SD) 

A mother has a right to know everything going on in her child's life 

because her child is part of her. (SD) 

It is a mother's duty to make sure she knows her child's innermost 
thoughts. (SD) 


SD =Strongly Disagree 
MD = МИА Disagree 
МА -Mildly Agree 

SA —Strongly Agree 


APPENDIX 2 


PERCENTAGE DISTRIBUTION OF THE RESPONSES TO THE PARI By MOTHERS OF 


UNDER-, AVERAGE-, AND OVER-ACHIEVING DAUGHTERS 


Item 
А good mother should shelter her child from life’s little difficulties. 





A mother should do her best to avoid any disappointment for her 

child. (SD) 

Parents who start a child talking about his worries don’t realise 

that sometimes it’s better to just leave well enough alone. (SD) 

Children and husbands do better when the mother is strong enough 

to settle most of the problems. (SD) 

There are some things which just can’t be settled by a mild dis- 

cussion. (SA) 

The child should not question the thinking of his parents. (SD) 

Loyalty to parents comes before anything else. (SD) 

There is no reason parents should have their way all the time, any 

(ep) than that children should have their own way all the time. 
D 

'The trouble with giving attention to children's problems is they 

usually just make up a lot of stories to keep you interested. (SD) 


SD =Strongly Disagree 
MD =Mildly Disagree 
MA -Mildly Agree 

SA = Strongly Agree 
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A LONGITUDINAL STUDY USING THE WPPSI 
AND WISC-R WITH AN ENGLISH SAMPLE 


By DOROTHY BISHOP 
(Department of Neurology, Churchill Hospital, Oxford) 


AND G. E. BUTTERWORTH 
(Department of Psychology, University of Southampton) 


Summary. An unselected sample of 139 English children from a rural area completed 
the WPPSI at the age of 44 years and the WISC-R four years later. Comparison of 
WISC-R results for this sample and Wechsler’s standardisation sample indicated that 
although this test yields IQ scores close to expected values, distributions of scores on 
some subtests, especially Similarities and Coding, are less satisfactory. Significant sex 
differences, in keeping with previous research, were found on some WPPSI and WISC-R. 
subtests. The association between verbal-performance discrepancies on the two tests 
was statistically significant but relatively weak. The results indicated that children 
who refused to co-operate with one or more of the WPPSI subtests were more likely than 
not to develop normally, but there was a raised incidence of WISC-R IQs below 85 in 
this group. A high proportion of low IQs was also found among children who missed 
the pre-school assessment because of family crises or parental refusal. 


INTRODUCTION 


Tue Wechsler Preschool and Primary Scale of Intelligence (WPPSD (Wechsler, 1967) 
devised as a downward extension of the Wechsler Intelligence Scale for Children 
(WISC) (Wechsler, 1949), is widely used to assess general intelligence of children aged 
between 4 and 61 years. The WPPSI is popular because, like Wechsler’s other scales, 
it has been adequately standardised and yields reliable estimates of IQ. However, 
although concurrent validity has been demonstrated by Wechsler (1967), little is known 
about the predictive power of the WPPSI. 


Research conducted decades ago with the Stanford-Binet scale warns of the 
dangers of assuming that high predictive validity is entailed by good test reliability and 
concurrent validity. Sontag et al. (1958), for example, concluded that: 


. the preschool tests may be considered as unreliable for prediction of future 
status in IQ because of the nature of changes in mental growth found in a group 
rather than from the standpoint of unreliability due to error of measurement ” 
(pp. 52-53). 

Does this mean that preschool estimates of intelligence are bound to be poor predictors 
of later functioning, or is it possible that a more recent test such as the WPPSI, which 
uses the concept of deviation IQ rather than mental age, will be more useful in this 
respect? 


Two studies conducted in the USA suggest that the ability of the WPPSI to 
predict later IQ might be impressive. However, both use atypical samples of children 
and relatively short follow-up intervals. Zimmerman and Woo-Sam (1972) cite an 
unpublished study by Austin and Carpenter in which the WISC was administered to a 
small sample of children who had been given the WPPSI one year earlier prior to 
enrolment in a Head Start programme. It is difficult to interpret the correlation of 
91 found between the two tests without further details of the sample. Rasbury et al. 
(1977) tested 90 children from predominantly upper middle class homes, using the 
WPPSI at age 54 years, and followed up with the revised version of the WISC (WISC- 
R) (Wechsler, 1974) about one year later. The social bias in the sample is reflected 
in the mean full scale IQ value of 119 for WPPSI and 115 for WISC-R. The correla- 
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tion between the two tests was -75, rising to -94 when a correction for restriction of 
range was applied. 


In this study we report correlations between WPPSI and WISC-R over a four-year 
interval in a population of unselected children spanning a wide range of ability. 


A second issue of interest is the applicability of Wechsler's scales to British 
children. As far as the WPPSI is concerned, early data from this project (Fairweather 
and Butterworth, 1977) supported two previous studies using this test with English 
samples (Brittain, 1969; Neligan ef aL, 1976) in finding good agreement with 
Wechsler’s USA norms, although a further study by Yule et al. (1969) suggested that 
the WPPSI might slightly over-estimate intelligence in English children. 


Regarding the WISC, the Scottish Council for Research in Education (1967) 
standardised this test on Scottish children and reported that it could be used with 
confidence to assess intelligence in such a population. Jones (1962) obtained 
mean IQ values in keeping with expected values in an urban population of English 
children. 


In 1974, Wechsler produced a revised version of the 1949 WISC, known as the 
WISC-R, which was standardised on a new representative USA population. One 
purpose of this revision was to replace obsolete and unfair items, and about one 
quarter of WISC-R items are new or substantially modified from the WISC. A 
number of studies carried out in the USA indicate that IQs obtained with the WISC-R 
tend to be about 4 points lower than those obtained with the WISC when the same 
children are given both tests (Covin, 1976; Hamm et al., 1976; Schwarting, 1976; 
Davis, 1977; Doppelt and Kaufman, 1977; cf Covin, 1977), but no data on British 
children have yet been reported. This project provided an opportunity to evaluate 
the applicability of the WISC-R to a sample of English 84-year-olds. 


The data reported here were collected in the course of a study concerned with 
prediction of educational failure in preschool children. Further findings from this 
project, including data on medical history, reading ability and handedness, will be 
reported elsewhere. 


METHOD 

Sample 

The study population consisted of all children registered at a general medical 
practice who reached the age of 44 years between April, 1972 and April, 1974. The 
practice is one of two in a small market town, and patients come from the town and 
surrounding countryside. Parents of children in the study population were approached 
and asked to co-operate with a preschool assessment programme. АП but 9 parents 
brought their child for testing, giving a sample of 189 children who were seen. 89 per 
cent were seen between the ages of 4:4 and 4:8, with the age range for the total sample 
being 3:10 to 5:8. Parents were contacted again between March, 1976 and April, 1978 
for follow-up when the children were around 84 years old. Twenty-seven families had 
moved away from the area, but in 8 cases it was nevertheless possible to retest the 
children. The remaining 19 children were untraceable or had moved too far for 
follow-up to be feasible. Three parents refused to co-operate with the retest. 92 per 
cent of children seen on the second occasion were aged between 8:4 and 8:8, the age 
range for the whole sample being 7:11 to 9:6. We also followed up 9 children who 
had missed the preschool assessment because of parental refusal (3 cases), family crises 
(4 cases) or clerical error (2 cases). 


Data from three children were excluded from consideration. In the first case, the 
child was chronically ill and missed so many appointments that he was much older 
than the rest of the sample when eventually seen; in the second, the child was pro- 
foundly deaf; in the third there was unequivocal evidence of brain damage. Results 
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from the second test session only were used for one further child who had a moderate 
hearing loss at 44 years. 


The social class distribution (Office of Population Censuses and Surveys, 1970) 
of the fathers of the 169 children seen on both occasions is shown in Table 1. This 
distribution is based on biological father's occupation at the time of the second testing, 
except in the case of adoptees (3 children) who were classified by adoptive father's 
occupation. Children of unmarried mothers and those with fathers in the armed forces 
are classified under * other’. Those whose father had died (3 cases) were classified 
according to his occupation when alive. The social class distribution of this sample 
does not differ significantly from expected values based on the distribution for married 
men in Great Britain (Office of Population Censuses and Surveys, 1973). 


TABLE 1 
SOCIAL CLASS DISTRIBUTION OF FATHERS 








Social class Obtained Expected* 
I 9 8:5 

П 26 32:1 
ІП non-manual 13 18:6 
ПІ manual 75 62:5 
IV 31 28-7 

У 6 118 
Other 9 6:8 
Total 169 169 


* Based on Office of Population Censuses and Surveys (1973). 


This sample included 14 pairs of siblings, two pairs of whom were twins. All but 
one child were white. 


Materials and method 

At the first examination, a male psychologist (GB) administered the 10 main 
subtests of the WPPSI. At the second examination, a female psychologist (DB) 
administered the WISC-R (omitting Mazes). On both occasions standard British 
amendments recommended by the National Foundation for Educational Research 
were adopted. In the second session, tests of reading and handedness were given 
before the WISC-R. On both occasions, children were tested in a quiet room in the 
general practice: The psychologist who saw the children at 84 years was unaware of 
background history or previous test results. 


RESULTS 


The bulk of the data reported below refers to the subset of children (65 boys and 
74 girls) for whom essentially complete results on WPPSI and WISC-R were available, 
henceforth termed the Main Sample. This sample included four children whose 
scores were prorated because they missed one or two subtests. The rest of the Main 
Sample completed all 10 subtests on both scales. 


The remainder of the sample may be divided into three groups: (i) those who 
were not traced for follow-up (N =19); Gi) those seen only at 84 years (N —8); (iti) 
those seen on both occasions who refused to complete some or all of the WPPSI 
subtests (N —30). Results for these groups outside the Main Sample will be presented 
separately below. 


D. V. M. Віѕнор and С. E. BUTTERWORTH 159 


Mean values obtained on WPPSI and WISC-R 

Table 2 shows results obtained with the WPPSI and WISC-R. Means for boys 
and girls combined were compared with those of Wechsler’s standardisation samples 
(assuming a mean of 10 and standard deviation of 3 for subtests, and a mean of 100 
and standard deviation of 15 for IQ values), using a series of 2-tailed z-tests with a 
stringent alpha level (-005). Differences between this sample and tbe standardisation 
population reached significance for WPPSI Arithmetic (z = – 3-17), WPPSI Similarities 
(2:23:73), WISC-R Information (z= - 3:72), WISC-R Similarities (z= —4-60), 
WISC-R Object Assembly (z=4-32) and WISC-R Coding (z= —3-61), but no IQ 
comparisons reached significance for either WPPSI or WISC-R. The variances 
obtained with this sample tended to be smaller than those of the standardisation 
population, and for some WPPSI subtests and for WPPSI VIQ these differences 
С significance on F=1-89; Arithmetic: F=1-83; Comprehension: 





76; VIQ: F= 
TABLE 2 
WPPSI AND WISC-R ВЕзотл$: MAIN SAMPLE 
Girls (N =74) Boys (N =65) Total 
Mean (SD) Mean (SD) Mean (SD) 

WPPSI 
Information 9-4 (3:01) 10:2 (2:43) 9-8 (2:77) 
Vocabulary 10-0 2-26) 1041 (2-10 100 (218) 
Arithmetic 8-8 2:41) 9-3 (1-95) 9-1 (2:22) 
Similarities 11-0 2-58) 11-2 (2-22) 114 (2-41) 
Сотргеһепѕіоп 10-1 (2-44) 10:2 (2-06) 10-1 oa 
Animal House 9-7 (2-62) 9-7 (2-12) 9-7 2-40) 
Picture Completion 9-7 (2-41) 10-2 (2-85) 10:0 (2-62) 

azes 9-0 (2:55) 10-8 (2:69) 9-8 (2-76) 
Geometric Design 10-5 (2:36) 10-1 (3-14) 103 (2:75) 
Block Design 9-8 (2:28) 10-8 (2:99) 10-3 (2:67) 
Verbal IQ (VIQ) 99-1 я 1.33) 101-2 (9-83) 100-1 (10-67) 
Performance IQ (PIQ) 98-2 11-43) 102-0 (13-91) 100-0 (12-74) 
Full Scale IQ (FSIQ) 98-6 (11-10) 101-8 (11:61) 100-1 (11:41) 
WISC-R 
Information 8-1 (20 9-6 (2:97) 8-8 (2:86) 
Vocabulary 10-4 2:26) 10-7 (2-68) 10-6 (2:46) 
Arithmetic 9-8 (2-72) 10-0 (2:84) 9-9 (2-77) 
Similarities 8-0 (3-43) 8-6 (3:72) 83 (3-57) 
Comprehension 9-7 (2:53) 9-8 (2-74) 9.7 (265) 
Picture Completion 10-4 (2-11) 112 (2-48) 10-8 2-31 
Block Design 10-0 (2:91) 11:3 (3:04) 106 3-04) 
Picture Arrangement 10-1 (3-00) 10-4 (2:65) 102 2:83) 
Object Assembly 11-0 210) 11:8 96) 11-4 2-89) 
Coding 9-6 2:83) 7-9 2:99) 8:8 (3:02) 
Verbal IQ (VIQ) 94-5 (12:97) 97-8 (14-29) 96-1 (13-52) 
Performance IQ (PIQ) 101-3 13; 7 103-2 (14-43) 102:3 (13:94 
Full Scale IQ SIO} 97-4 12-99 100:3 (14:55) 98-8 (13:6 





Distributions of IQ scores for WPPSI and WISC-R are shown in Table 3. VIQ, 
PIQ and FSIQ distributions differed significantly from expected values, reflecting the 
reduced variance in this sample compared to the standardisation sample. Although 
VIQ, PIQ and FSIQ distributions did not differ significantly from expected values, 
there was a trend towards under-representation of scores at the high end of the VIQ 
scale, and under-representation of low scores for the PIQ scale. 


To investigate the pattern of performance on those subtests where there were 
significant differences in means between this sample and the standardisation popula- 
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TABLE 3 
DISTRIBUTIONS OF IQ SCORES FOR MAIN SAMPLE 





Obtained frequency 




















WPPSI WISC-R 
Expected 

IQ range frequency VIQ PIQ FSIQ VIQ PIQ FSIQ 
125 or over 72 3 4 5 2 8 5 
115-124 15:9 8 12 3 9 15 12 
105-114 30-0 39 27 38 . 30 33 29 
95-104 35-9 49 56 49 41 6 40 
85-94 28-9 29 27 35 29 27 38 
75-84 14-8 8 11 7 17 6 9 
74 or less 62 3 2 2 11 4 6 

x- 18-6* 17.9* 21:9* 1155 9-4 73 

* Р<.01. 


tion, distributions of standard scores were plotted, and are shown in Figure 1. For 
WPPSI Arithmetic, WPPSI Similarities and WISC-R Object Assembly, the means and 
modes of the obtained distributions closely correspond. However, for WISC-R 
Similarities, the mode is 10, and the distribution closely follows the theoretical 
distribution for most of its range, but an unexpectedly high proportion of children 
obtain a scaled score of 1. Thus the low mean obtained by this sample does not 
reflect a general shift of the distribution of scores, but the contribution of a substantial 
minority of children who do very poorly indeed. Mean and mode also fail to corres- 
pond in WISC-R Coding, where the mode is 6. Here the picture is complicated by 
sex differences, which will be discussed below. 


An item analysis carried out on Similarities (using the total corpus of data from 
all children given this subtest), suggested that a different order of presentation or a 
different criterion for stopping would have resulted in fewer scaled scores of 1. If 
children who were not given later items are regarded as having failed those items, then 
the percentages of children scoring zero on the first six items is as follows: item 1: 
11 per cent; item 2: 24 per cent; item 3: 39 per cent; item 4: 19 per cent; item 5: 
16 per cent; item 6: 18 percent. That is to say, items 5 and 6 are easier than earlier 
items, and children not given these items because of earlier failure might have passed 
them. Indeed in five cases testing was continued beyond the first 4 items after the 
criterion for discontinuing had been reached (because of uncertainty about scoring 
criteria) and in three such cases the children did earn points on items 5 and 6 (which 
could not however be credited). 


Another reason for the high incidence of scaled scores of 1 on Similarities was 
the failure of some children to grasp what was wanted of them. In particular, some 
children interpreted the question “In what way area ... and а... alike?” as “ Are 
a...anda... alike?” or “In what way area... anda... different?" The 
following extracts from protocols illustrate these patterns. 


(Child no. 78) (FS IQ «87; similarities scaled score = 1) 

Tester: In what way are a wheel and a ball alike? How are they the same? 

Child: No. 

Tester: Well, they are both round and they both roll. Now tell me, in what way are 
a candle and a lamp alike? 

Child: Yes. 

Tester: In what way are a candle and a lamp alike? How are they the same? 

Child: Because they both shine (scores 1 point). 
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FIGURE 1 


SUBTESTS WHERE MEANS DIFFER FROM 10. DISTRIBUTIONS OF SCALED SCORES (SHADED) 
COMPARED WITH EXPECTED VALUES (BOLD LINE) 
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Tester: In what way are a shirt and a hat alike? 
Child: No. 


(Child no. 161) (FS IQ =90; similarities scaled score = 1) 

Tester: In what way are a wheel and a ball alike? How are they the same? 

Child: The wheel ain’t as round as the ball. 

Tester: Well they are both round... 

Child: No, because the ball’s all the way round. It’s all shaped round, and it’s just 
shaped round. A wheel ain’t got a round around the edge. 

Tester: But they are both round and they both roll. That’s how they are the same. 

G but when it falls, the wheel falls on the side, and the ball can’t fall on the 
side. 

Tester: That’s right, but you've got to find out how they are the same. They are the 
same because they're both round and they both roll. Now tell me, in what way 
are a candle and a lamp alike? 

Child: A lamp can be light longer than а candle cos the candle just goes straight down 
when it's lit. The wax just melts away some of it. 

Tester: But they are alike because they both give light. 

Child: Yeah, but a light shows things good. Better than a candle cos it don't light 
up too much. 


FIGURE 2 


SUBTESTS YIELDING SEX DIFFERENCES. DISTRIBUTIONS OF SCALED SCORES (SHADED) 
COMPARED WITH EXPECTED VALUES (BOLD Ілме) 


WPPSI Mazes WISC-R Information WISC-R Block design WISC-R Coding 


$ 






Proportion 





15 6-8 9-11 12-14 15-19 1-5 6-8 9-11 12-14 15-19 


7 





1-5 6-8 9-11 12-M 15-19 15 6:8:9-11 12-14 15-19 
SCALED SCORE 


Sex differences 

Mean scores on the 20 subtests from both sessions for boys and girls are shown 
in Table 2. Two-way analysis of variance was used to compare these means after 
excluding data from nine girls selected at random to equalise numbers of the two 
sexes. The effect of sex did not reach significance, but there was a significant effect of 


D. У. M. BisHop and С. E. BUTTERWORTH 163 


subtests (Е =20:9; df=19,2432; P<-01), and a significant interaction between sex 
and subtests (F=26:99; df=19,2432; P<-01). А test of multiple comparisons 
showed that boys did significantly better than girls on WPPSI Mazes, WISC-R 
Information and WISC-R Block Design, whereas girls excelled boys on WISC-R 
Coding. F-tests were used to compare variances for boys and girls on WPPSI and 
WISC-R subtests, but no differences reached significance. 


Correlation between IQ scores 
Correlations between WPPSI and WISC-R IQ scores are shown in Table 4. 
Values were also computed separately for boys and girls but significant sex differences 








TABLE 4 
INTERCORRELATIONS BETWEEN IQ SCORES 
WPPSI WISC-R 

Scales VIQ PIQ FSIQ VIQ PIQ FSIQ 

PIQ 57 
WPPSI Ето 38 89 

VIQ 66 56 "69 
WISC-R РІО 48 ‘70 ‘67 66 

FSIQ 63 :69 “75 % 90 


іп size of correlations were not found. Тһе regression of WISC-R FS IQ (y) on 
WPPSI FS IQ (x) fits the equation y = 8-63 +0-9 x, with a standard error of estimate 
equal to 9-2. 


Verbal-performance discrepancies 
A verbal-performance discrepancy score (V-P score) was calculated for each child 
by subtracting PIQ from VIQ. Distributions of V-P scores are shown in Figure 3. 


FIGURE 3 
DISTRIBUTIONS OF VERBAL-PERFORMANCE DISCREPANCIES. 
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Although the mean discrepancy on the WPPSI is close to the expected value of zero, 
the distribution of WISC-R. V-P scores is shifted to the left, reflecting the tendency 
for PIQ to exceed VIQ. Thus large discrepancies on WISC-R in favour of PIQ are 
more common than large discrepancies favouring VIQ: in no case where a discrepancy 
of 21 or more points occurred was the difference in favour of VIQ. The distributions 
of verbal-performance discrepancies found here may be compared with those expected 
on the basis of Wechsler's norms. Table 5 shows that on the WPPSI, larger dis- 
crepancies are slightly less frequent than expected, whereas on the WISC-R, the figures 
for expected and actual frequencies agree quite well. 


TABLE 5 


PERCENTAGE OF MAIN SAMPLE OBTAINING DISCREPANCIES BETWEEN 
VIQ AND PIQ on WPPSI (Ace 44) AND WISC-R (AGE 84) 





Expected percentage Actual percentage with 
obtaining given or greater given or greater 
Discrepancy score discrepancy* discrepancy 
WPPSI WISC-R 
9 50 43 53 
15 25 20 29 
17 20 14 19 
21 10 5 10 
25 5 2 4 
30 2 1 2 
33 1 0 1 
42 01 0 0 





* Based оп Sattler (1974). 


Stability of verbal-performance discrepancies 

The correlation between V-P scores оп WPPSI and WISC-R was -42 (P < -001). 
Although statistically significant, the association between V-P scores on WPPSI and 
WISC-R is much weaker than the association between IQ scores on these scales. 
However, we might expect a much stronger association to emerge if we consider only 
highly reliable V-P scores, i.e. those which are unlikely to have arisen from error of 
measurement alone. Table 6 shows the frequency with which children with a verbal 


TABLE 6 


ASSOCIATION BETWEEN EXTREME У-Р Score ом WPPSI AND WISC-R 
(cell entries are numbers of children) 





WPPSI 
V-P score of +15 points or V-P score less extreme than 
WISC-R more extreme +15 points 
V-P score of +15 points or 
more extreme 12 27 
V-P score less extreme than 
+15 points 15 85 


performance discrepancy of at least 15 points on the WPPSI obtain a discrepancy at 
least as extreme on the WISC-R. Although the association is statistically significant 
(x? =4-46; P<-05), it is evident that the majority of children with a discrepancy of 15 
points or more at 44 years do not have a discrepancy of this magnitude at 84 years, 
and vice versa. 
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WISC-R internal consistency 

Split-half reliability coefficients for WISC-R subtests (except Coding) are shown 
in Table 7 and may be compared with the values for 84-year-olds obtained by Wechsler 
using the same procedure. Wechsler’s procedure was also followed in computing 
reliability coefficients for IQ scores. Since no value was available for Coding, the 
estimate used by Wechsler was adopted. 








TABLE 7 
INTERNAL CONSISTENCY COEFFICIENTS FOR WISC-R 
Wechsler 
Subtests This Sample 83-year-olds 
Information “77 “80 
Similarities 84 79 
Arithmetic "75 69 
Vocabulary ‘TT 86 
Comprehension "67 ‘73 
Picture Completion 65 85 
Picture Arrangement ‘71 69 
Block Design "81 85 
Object Assembly 57 "66 
VIQ :92 :92 
РО "88 -91 
FSIQ 91 95 


Characteristics of uncooperative and untested children 

Table 8 shows scores for the 30 children seen at 44 years who failed to complete 
all subtests, the 16 children seen at 44 years who were not traced for follow-up, and the 
7 children missed at 44 years because of parental refusal or family crises. Children 


TABLE 8 
MEAN IQs: MAIN SAMPLE AND OTHER GROUPS 








WPPSI WISC-R 
N FSIQ FSIQ 
Boys Girls Mean (SD) Mean (SD) 

Main sample 65 74 1001 (1-41) 98-8 (13:66) 
Children who were not followed up 10 6 99:3 (15:71) 
Children who refused 1-9 WPPSI 

subtests 11 8 91:3 (12:70) 
Children who refused all WPPSI . 

Subtests 2 9 875 (17-18) 
Children not given WPPSI because 

of parental refusal or family 

crises 4 3 880 (1791) 





who were not followed up do not differ significantly іп WPPSI FSIQ from those who 
were followed (F<1). One way analysis of variance was used to compare WISC-R 
FSIQ for the groups shown in Table 8, and revealed a significant difference (F —4-58; 
df-3,175; P<-01). Two-tailed t-tests for specific comparisons showed that the 
differences between the Main Sample and children who refused some subtests (t = 2-42; 
df =156), the Main Sample and children who refused all subtests (t—2-72; df 148), 
and the Main Sample and children missed at 44 years (1=2:02; df=144), were 
significant at the -05 level. No other differences reached significance. 
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DISCUSSION 

In the absence of adequate empirical data on the predictive validity of the 
WPPSI, authorities on IQ testing have been justly sceptical about the meaningfulness 
of preschool estimates of IQ. Sattler (1974) for example echoes the conclusions drawn 
30 years earlier by Bradway (1944) from her longitudinal study with the Stanford- 
Binet: “IQs obtained prior to age six must be interpreted with discretion " (p. 16). 

The data reported here at last enable us to say how much confidence may be 
placed in WPPSI IQ scores. From the regression equation described above we may 
say that there is a 9 to 1 chance that WISC-R IQ at 84 years will be within 15 points 
of WPPSI IQ at 44 years. It is noteworthy that the correlation of -75 between WPPSI 
and WISC-R IQ in this sample is not significantly lower than the value of -82 reported 
by Wechsler (1974) for a sample of 6-year-olds given the WPPSI and WISC-R within 
a period of one to three weeks of one another (one-tailed test using Fisher's z trans- 
formation: z=1-09, NS). The relative intellectual status of children in this population 
appears remarkably stable over a four-year interval, with most of the observed 
variability in scores being attributable to test unreliability. Perhaps this is less sur- 
prising when we consider that the environmental circumstances of most children in the 
sample do not change markedly, and that a broad range of ability is represented. 

A second question to consider is how these results compare with Wechsler's 
norms. The sample tested here has a social class distribution similar to that for Great 
Britain as a whole and consists of fairly typical English rural children. If we accept 
that the WPPSI is valid with English children (Brittain, 1969; Neligan et al., 1976; 
cf Yule et al., 1969) the results obtained with this sample at 44 years may be taken as 
evidence of its unbiased constitution. Thus although this study in no way substitutes 
for a British standardisation, being limited as it is to white rural children of one age 
range, it does nevertheless provide an opportunity to evaluate the WISC-R with 
English children. 

Although mean values of WISC-R VIQ, PIQ and FSIQ did not differ significantly 
from 100, means of four subtests did differ significantly from expected values. The 
most unsatisfactory subtest in this respect is Similarities, where 9 per cent of children 
obtain a scaled score of 1. 

Given that there is good evidence that the administration of Similarities recom- 
mended by Wechsler yields a distorted distribution of scores, until the scale is re- 
standardised in Britain, we feel it would be justifiable to alter administration with 
British children so that the criterion for discontinuing only be applied from item 4 
onwards, irrespective of scores on items 1 to 3. 

Although it is a recognised aim of Wechsler to minimise sex differences in his 
scales (Wechsler, 1958), small but statistically significant sex differences are found in 
his standardisation populations (Seashore et al., 1950; Miele, 1958; Kaufmann and 
Doppelt, 1976). The tendency found here for boys to do better than girls overall is 
consistent with previous research (Seashore et al., 1950; Kaufman and Doppelt, 1976; 
Brittain, 1969; Jones, 1962) as is the male superiority on WPPSI Mazes (Herman, 
1968; Yule et al., 1969) and WISC-R Information (Peters, 1976) and the female 
superiority on WISC-R Coding (Miele, 1958; Darley and Winitz, 1961; Lyle and 
Johnson, 1974; Peters, 1976). 

In sum, given the good agreement between the distribution of scores obtained 
here and expected values (see Table 3), it seems that the WISC-R can be used with 
confidence to assess Full Scale IQ in English children. However, even though our 
estimates of internal consistency of subtests agree well with those reported by Wechsler 
(1974), it would be rash to attempt subtest pattern analysis with this test, given that 
the means of several subtests differ significantly from 10 for one or both sexes. We 
recommend that the tables provided by Wechsler (1974) for testing significance of 
differences between scaled scores should not be used with English children. 
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Similarly, we would urge caution in the interpretation of verbal-performance 
discrepancies. Discrepancies of 15 points or more are reliable in the sense that they 
are unlikely to arise from error of measurement alone, but the data presented here 
show that such discrepancies are neither unusual nor particularly stable over time. 
Further, we found that on the WISC-R, large discrepancies favouring VIQ were much 
less common than those favouring PIQ, suggesting that direction as well as size of a 
discrepancy will determine its significance in this test. (However, although results as 
extreme as this have not been previously reported, there is evidence that discrepancies 
favouring РІО are characteristic of rural children (Seashore et al., 1950; Seashore, 
1951), so this imbalance may not typify English children.) Also, evidence relating 
large discrepancies in children to external criteria such as reading retardation or 
neurological disease is weak. For example, Ackerman et al. (1971) showed that many 
children with large discrepancies are neurologically and behaviourally normal, whereas 
other authors have found that a minority of children with demonstrable neurological 
abnormalities have large discrepancies (Rowley, 1961; Pihl, 1968; Rourke et al., 
1973; Black, 1976). In general, there seems little justification for attaching much 
importance to discrepancies of less than 25 points. A more detailed examination of 
the relationship between verbal-performance discrepancies and external criteria is 
currently in preparation. 


Failure to co-operate with testing at the age of 44 years was not uncommon in 
this study. We are not aware of previous research on children from normal popula- 
tions who do not co-operate, and it seems likely that such children are typically 
discarded from studies. We found that although the majority of such children develop 
normally, there was a raised incidence of children obtaining low WISC-R IQs. 33 per 
cent of children who did not co-operate with some or all subtests at 44 years obtained 
WISC-R FSIQs of less than 85, compared with 11 per cent of other children. Itis im- 
possible to disentangle cognitive, emotional and motivational factors responsible for 
poor performance in such cases, but it is noteworthy that four of the 10 children who 
had refused the WPPSI and who did poorly on the WISC-R were thought by the tester 
to have markedly abnormal behaviour in the WISC-R test session. Although it would 
be wrong to assume that non co-operative children will have problems they should 
be carefully followed up until the tester is satisfied that there is no cause for concern. 


We may also note that four out of the seven children who missed the preschool 
screening because of parental refusal or family crises obtained an IQ under 85 at 84 
years. It was our impression that parents were less willing to bring their child for 
testing if they suspected he had difficulties, or if other children in the family were in 
any way abnormal. Although numbers are too small for firm conclusions, this result 
suggests that children who are missed by this sort of screening procedure include a 
high proportion at risk for educational failure. 
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COGNITIVE STYLE AND ADVANCE ORGANIZERS 
IN LEARNING AND RETENTION 


Ву D. J. SATTERLY Амр I. G. TELFER 
(School of Education, University of Bristol) 


SUMMARY.— I he interaction of field independence and advance organizer in learning and 
retention was studied in 96 boys and 84 girls aged 14-15 years. Field independence was 
stratified into three levels and pupils randomly assigned to twelve teaching groups, 
followed by random allocation of teachers to groups and groups to one of three treat- 
ments (lessons; lessons plus advance organizer; lessons plus advance organizer plus 
specific reference to its organizing properties). Two lessons on meaning, analysis, and 
construction of words were taught and the groups compared in recall and transfer. 
Four-way analysis of variance gave significant differences between cognitive styles in 
recall and transfer and between learning and retention. Residual estimates of learning 
were obtained and a significant interaction of style with treatment observed. Field 
dependent pupils achieved greatest gains where the organizer was used with specific 
reference to its organizing properties. 


INTRODUCTION 


Sruptes of school learning have identified a large number of factors associated with 
achievement but no coherent theory linking them has yet been developed. Recent 
attempts to understand differences in school progress have been chiefly observational 
or quasi-experimental enquiries incorporating psychological and sociological studies 
of events or factors inside and outside classrooms. Conclusions have been varied. 
Some studies have claimed that greater achievement gains are made under certain 
* styles ° of teaching (e.g. Bennett, 1976); that length of school day and amount of 
teaching are determinants of achievement (e.g. Stallings, 1976; Fisher et al., 1977) and 
that teachers skilled in the managerial techniques necessary to maintain a high level 
of work activity among pupils provoke more learning (Westbury, 1977). By contrast, 
other types of enquiry have stressed that school and teacher differences exert little or 
no effect on how much children learn and instead attribute differences in attainment 
to home and background variables (e.g. Jencks, 1973). 


One problem in the integration of findings which apparently contradict one 
another into a model of the teaching learning process is that the studies have invariably 
adopted a mixture of explanatory concepts drawn from a number of levels of analysis. 
Attempts to develop a model of school learning which employs a small number of 
simplifying concepts, and from which testable hypotheses can be drawn, have been 
made by Carroll (1963), Harnischfeger and Wiley (1975), and Bennett (1978). А basic 
assumption of these models seems to be that teaching behaviour does not directly 
influence a child's achievement. If differences between teachers can be eliminated 
from explanation or prediction of achievement then it is a relatively simple matter to 
regard individual differences between children also as being of comparatively little 
importance. Yet the range of individual differences in classrooms is one of the most 
striking phenomena open to the observer as Bruner (1966) and Cronbach (1977) have 
discussed. The expectation that certain definable differences among children interact 
with characteristics of teaching method is common and implied by a great deal of the 
literature in educational psychology. Paradoxically, attempts to ‘ pin down’ these 
interactions in experimental studies have often proved disappointing (Cronbach and 
Snow, 1977). 


The study reported in this paper is of a predicted aptitude treatment interaction. 
It examines the relevance of aspects of a cognitive theory of school learning which has 
implications for teaching (Ausubel, 1967) and of an individual difference construct 
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(field independence) which is held to have considerable theoretical relevance for 
education (Witkin et al., 1977). 


In his theory of verbal learning Ausubel describes how new learning material: 
which is * potentially meaningful ’ to a child becomes meaningful by being related to 
what the child already knows (his ‘ cognitive structure’). Several conditions must be 
met before the material can be said to have been learnt meaningfully but, of these, by 
far the most crucial is the existing organization of the child’s ideas. Ausubel has 
proposed that the presentation of an unfamiliar body of knowledge should take the 
form of general and inclusive ideas to ensure ‘ ideational scaffolding’ for the sub- 
sequently learned new knowledge to be assimilated to it. With relatively unfamiliar 
material Ausubel has recommended the use of an expository organizer to provide a 
holistic conceptual structure to which learners may relate the new ideas. Such 
organizers consist of concepts and propositions presented in advance of new learning 
material and bear a superordinate relationship to the subordinate concepts and specific 
facts presented in subsequent new knowledge (Ausubel, 1963). A number of enquiries 
have suggested the effectiveness of learning using organizers (Allen, 1970; Ausubel, 
1960, 1978; Ausubel and Youssef, 1963; Ausubel et al., 1968; Cohen, 1977; Fitz- 
gerald and Ausubel, 1963; Kuhn and Novak, 1971; Lawton, 1977). Not all studies 
have supported Ausubel’s theory, however (Clawson and Barnes, 1973; Barnes and 
Clawson, 1975; Barron, 1971). Apparently the only interaction of organizers with 
individual differences which has been investigated has been general ability as defined 
by IQ tests, but their relative effects on bright and dull children have been the subject 
of conflicting results (Ausubel et al., 1978). 


There are a number of problems in testing Ausubel’s theory and some of these 
have been described by Lawton and Wanska (1977). Fundamental to Ausubel's 
theories, however, is the aid that organizers provide to the structuring of material. 
Individual differences in the structuring of experience have been extensively investi- 
gated by Witkin in studies of cognitive style, especially field independence. This 
construct describes the tendency of learners to extract a given figure by analysis from 
an embedding context and to impose structure where it is lacking in a number of types 
of situation (perceptual and intellectual). The experience of field independent subjects 
in a number of areas is relatively articulated, that of field dependents more global. It 
has been claimed that a substantial degree of self-consistency exists in a person's 
functioning across diverse areas of experience and that tapping a person's performance 
in perception through the Embedded Figures Test reveals “ his general tendency to 
function at a more or less differentiated level " (Witkin et al., 1971, p. 14). Stasz et 
al. (1976) investigated the conceptual structures made by field independent and field 
dependent learners when the organization of the information was left to the individual; 
major differences were found. Field independents employed clearly defined concep- 
tual groups whereas field dependents were far more likely to leave ideas clustered in 
large, loosely organized groups. Similar differences in structuring have been found 
by other workers (Moore et al., 1970; Nebelkopf and Dreyer, 1970). 


The chief hypothesis tested was that use of an advance organizer would interact 
with field independent cognitive style. More specifically the hypothesis referred to 
performance in learning and retention in tests of recall and transfer. No effect of 
organizers was predicted for recall because of the ease with which items can be rotely 
learned using superficial forms of processing, without the attainment of meaning 
(Craik and Lockhart, 1972; Craik and Tulving, 1975). A significant interaction was, 
however, predicted between cognitive style and treatment in tests of transfer, where 
organizers were expected to enhance the performance of field dependent, though not 
field independent, subjects. For transfer in verbal learning to take place, the material 
requires a deeper level of processing than that which is adequate for recall since the 
semantic properties must be grasped if the appropriateness of the material for a new 
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situation is to be recognized. Because an organizer is designed to enhance meaningful 
learning and an interval between initial learning and later retention permits progressive 
organization (or deeper processing in Craik and Lockhart's terms) the organizer is 
expected to have greater effect on the retention transfer scores of field dependent than 
of field independent children. This is because the former display far more limited 
structuring ability than the latter. Moreover, it is possible that learners, especially 
when weak in structuring ability, may fail to understand the properties of organizers 
and their relationship with the more specific ideas to be learnt. This may be particu- 
larly noticeable among younger children. For this reason a treatment was incorporated 
into the design in which the relationship between the more general and inclusive ideas 
of the organizer and the specific information was explicitly demonstrated. Most 
learning was predicted for all children irrespective of style where the organizer was 
used and its potential organizing properties demonstrated (treatment Аз below). 


METHOD 

Sample 

The children who took part in the main study were 180 pupils (96 boys, 84 girls; 
mean age 14 years 10 months) from an English comprehensive school. The mean IQ 
on entry to the school was 100-7 with standard deviation 13-8. "These values are close 
to the population values and do not indicate any marked departure for the sample from 
average ability (z —-56). In addition, 102 pupils of the same age but from a different 
School took part in an enquiry to estimate the effects on scores of taking the test and 
receiving the organizer but without teaching. 


Measurement of cognitive style 

A version of embedded figures was used. The test consists of 24 rows of shapes 
as used by Gardner et al. (1960) derived from those first presented by Thurstone (1944). 
Each row consists of a simple shape and four complex figures some of which contained 
the simple shape. For each complex figure in the row, the subjects were required to 
indicate by a tick those that contained the simple shape and by a cross those that did 
not. The raw score was the number of items correctly marked (with a tick or cross) 
minus the number of items incorrectly marked. The resulting distribution of field 
independence closely approximated the normal distribution (chi?--92). Although 
Witkin's original work employed more than one criterion measure of field indepen- 
dence subsequent enquiries have shown that these tests often have only low correla- 
tions (Vernon, 1972). Nevertheless, the group version of EFT has been extensively 
used as the primary index of field independence (Jackson et al., 1964) and is advocated 
by Witkin for this purpose (Witkin et al., 1971). 


Design of the experiment 

The field independence distribution was stratified into three levels each containing 
60 subjects which were described as ‘ field independent ’, * intermediate ’, and * field 
dependent? groups. From within these blocks pupils were randomly assigned to one 
of twelve teaching groups (15 pupils per group), each with five representatives of each 
level of style. Twelve teachers were then randomly assigned to the teaching groups 
and four groups randomly allocated to each of three experimental conditions: 


Treatment A1: Two lessons on word structure. (Control group.) 
Treatment А2: Advance organizer plus two lessons on word structure. 


Treatment АЗ: Advance organizer plus two lessons on word structure plus 
specific references at fixed points in the lessons to induce meaningful set by drawing 
the learner's attention to the superordinate ideas presented in the organizer. 


Factor A in the design corresponds with treatments (3 levels); factor B, the 
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random factor of teachers within treatments (12 levels); factor C, cognitive style 3 
levels); and factor D, testing occasions (2 levels, learning, and retention 1 week later) 
which is a repeated measure. Two further control groups, which did not figure in the 
main analysis, were also used to check whether the use of advance organizer alone led 
to learning of the material in the test. Four classes in a similar comprehensive school 
were employed for this specific purpose. Two classes were taught the organizer 
(N =50) and then took the test, and two (N =52) took the test alone. 


Materials | 2. 
The content of the lessons was chosen using three criteria: 


(a) It should be similar to the type of material expected by pupils as part of their 
normal English curriculum, to maintain normal classroom atmosphere. 


(b) It should be perceived as educationally worthwhile by the teachers concerned 
to ensure their full co-operation. 


(c) It should be unfamiliar to the pupils. 


All material eventually used in the experiment was selected as a result of class- 
room trials. All teachers were experienced and lesson plans and timing were fully 
discussed with them to ensure close similarity of teaching method. The administrative 
arrangements were such as to eliminate the possibility of communication between 
groups. As а control strategy, pupils in classes assigned to treatment А1, which did 
not involve the advance organizer, received teaching on the history of the English 
language for the same length of time. This ensured that all groups spent the same 
length of time in learning the material to be tested, and equalized the length of 
adjustment to the teacher who was unfamiliar to some of the pupils in the teaching 
group set up for the experiment. 


The material taught to all groups contained four main ideas: 
(1) that the smallest unit of meaning is not the word; 


(ii) that whole words can be split into parts that carry definite meanings, and the 
meanings of the word may be the sum of the meanings of the parts or more than this; 


(iii) that parts of words can be recombined in different ways to give different 
meanings; 


(iv) the methods by which meanings can be attached to new words. 


The teaching method used was instructional and the new ideas were taught by 
analysing the meaning of certain common words and from this identifying the concept 
ofa morpheme. Five methods for making new words to carry specific meanings were 
then taught: the combination of words, the use of affixes, the shortening of familiar 
words, the borrowing of foreign words and onomatopoeia. 


The advance organizer was designed to facilitate assimilation of these ideas by 
pointing out clearly to the children the concept of meaning and the way in which the 
meanings of sentences are obtained from their structure thus making explicit what was 
already familiar but not consciously understood, and thereby providing an overall 
framework into which the material on the subordinate structure of words could easily 
be fitted. This was taught by using examples and exercises showing how different 
arrangements of the same words could have different meanings and by showing how 
the meanings of unfamiliar words could be obtained by use of a dictionary or context. 
In treatment A3 the references to the organizing properties of the organizer were made 
by drawing the pupils' attention to what they had learnt of the concept of the struc- 
turing of meaning of which word building is a specific instance. This reference was 
made at five points during the lessons. 
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Testing learning and retention 

Part 1 (* recall ") of the test consisted of 30 items to test the recall of the content 
of both lessons and application of the principles to familiar problems. Part 2 (‘ trans- 
fer °) of the test consisted of 37 items and provided children with factual information 
to be transferred to unfamiliar problems. Test scores were marked for each part 
separately and combined to give a total score. Testing was carried out on two 
occasions, following the lessons (* learning test °) and again one week later (* retention 
test’). Reliability estimates for the test were, part 1, 81; part 2, -86; and total score, 
"91, after application of the Spearman-Brown formula. 


RESULTS 


Distributions of scores on all tests were examined but no significant departure 
from normality was indicated by the chi-square goodness of fit test. Sex differences 
in scores on the test of cognitive style, intelligence, learning, and retention were also 
studied but the largest / ratio obtained was £ (178) 20-87 which was non-significant 
and the results were analysed for male and female subjects combined. 


Effects of organizer without teaching 

'The mean score of pupils who had received advance organizer and parts 1 and 2 
of the test combined (N=50; mean = 12-06; SD = 5:64) was not significantly different 
from the mean score of those who had been tested without learning of the advance 
organizer (N —52; mean-12-78; SD —4-49). Both mean scores were significantly 
lower than those of subjects who took part in tbe experiment (P«-001) Any 
observable differences between the three main treatments are, therefore, unlikely to be 
attributable to the organizer alone. 


Comparisons of the three treatments and levels of cognitive style 

Although the two parts of the test overlapped, the correlation (r —:59, P < -01) was 
low enough to justify the separate analysis of parts 1 and 2 required by the hypothesis 
that the interaction of field independence with treatment would be found in the test of 
transfer but not of recall of information. 


Analysis of variance was used to examine the differences between and within 
treatments for recall (learning and retention) and transfer (learning and retention) tests. 
Main effects of treatments, although not of prime interest, were first examined using 
the teachers’ within treatment variance terms as the error term. The largest F-ratio 
was found for the learning-retention difference (Е — 2:65, df 2,9), but this fell far short 
of the value required for significance. Since differences between teachers within 
treatments had been found to be uniformly non-significant and the test of the main 
effect of treatments using this term had also proved non-significant, the remaining 
analyses of variance were simplified by the omission of the teaching factor. 


Mean scores in tests of recall and transfer are presented in Tables 1 and 2 
respectively. 


TABLE 1 
MEAN SCORES IN RECALL Test (LEARNING AND RETENTION) 











Test Treatment Ay А? Аз 
Field independent 15:30 14-20 16-60 
LEARNING Intermediate 12:20 12-10 13:30 
Field dependent 12:35 11-60 10-25 
Field independent 15-60 15-30 17-10 
RETENTION Intermediate 13-30 12-80 16-05 


Field dependent 12:45 11:25 11:50 
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TABLE 2 
MEAN SCORES IN TRANSFER TEST (LEARNING AND RETENTION) 








Test Treatment Ay Аз Аз 
Field independent 13-25 13-15 15-00 
LEARNING Intermediate 11-20 10-05 12:40 
Field dependent 7:35 9-15 9-20 

Field independent 17:85 15-45 19:25 
RETENTION Intermediate 11:55 12-20 14-85 
Field dependent 8-20 9-50 12-55 


Results of the repeated measures analysis of variance of recall scores revealed a 
significant difference for cognitive style (Е 15-43; df=2,171; P «-001) and for the 
difference between learning and retention (F —7:35; df=1,171; Р «-001). For recall, 
as predicted, there were no significant differences between treatments. In the transfer 
scores, differences again attained significance in respect of styles (P < 001) and for 
learning and retention (P «-001). In addition a significant difference for treatments 
(F-461; df=2,171; P<-05) and for the interaction of styles with learning and 
retention (Е =4:88; df=2,171; P«-01) were obtained. It will be noted that the 
difference between treatments, which emerges when children within groups were 
employed as the error term, did not appear when the more appropriate teachers within 
treatments term had been employed. Interactions involving cognitive style had been 
predicted and were subject to closer analysis in studies of ‘ gain’ over the period of 
one week between learning and retention scores. 


Scores obtained under treatment À3 were significantly higher than under the 
other two treatments. In order to assess gain during the retention period of one week, 
it was desirable to obtain an estimate of change for all groups since there was a 
difference between the groups in learning scores. The measurement of change or gain 
in classroom and similar research presents serious problems (Cronbach and Furby, 
1970; Cronbach, 1976) but an index of relative amount may be obtained by calculation 
of residual gain scores (DuBois, 1957; O'Connor, 1972). This method uses the 
regression line which provides the best fit to all points and singles out those children 
or groups which gained more (positive residuals) or less (negative residuals) than 
would be predicted by the use of that line. 


Analysis of covariance, though widely regarded by many as a suitable technique 
for data of this type to adjust for initial differences in learning scores, could not be 
applied to investigate the effects of treatment on retention scores since the covariate, 
having been obtained after application of the treatments, is correlated with them. 
This technique, though preferred by Overall and Woodward (1977) and Borich (1977), 
would lead, therefore, to the removal of variance correctly attributable to the effects 
of the treatments. Two-way analysis of variance of the residual gain scores was, 
therefore, carried out, their distribution having been shown to be approximately 
normal by Geary's test (D'Agostino, 1970). The predicted interaction of style with 
treatment was identified (F=6-42; df=4,171; P«-01) Significance tests on 
differences between pairs of means between and within styles were made on the data 
graphed in Figure 1. 


Of greatest interest is the significantly higher gain of field dependent pupils under 
treatment АЗ than under either of the others (P<-01). For pupils of intermediate 
cognitive style the mean score under treatment A3 was significantly higher than under 
treatment Al (P «-01). Scores for all groups collectively receiving treatment A3 were 
significantly higher than those which had received no organizer (Р < :01). 
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FIGURE 1 
MEAN RESIDUAL GAIN SCORES IN TRANSFER TEST 
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One problem in the conduct of research designed to investigate the interaction of 
individual differences with teaching methods is that the former are almost always 
covariates of other individual difference variables. Assigning field independence at 
random to treatments, therefore, simultaneously manipulates other individual differ- 
ences correlated with it. Justification for doing so, therefore, rests on two criteria: 


‚ (i) that field independence as an operational variable has theoretical interaction 
with the treatments studied, and 


(ii) that the correlations between the intelligence test (probably the best single 
summary index of individual differences related to learning) and learning/retention are 
uniformly smaller than those between field independence and the dependent variables. 
This difference is demonstrated when the average partial correlation of field indepen- 
dence with learning and retention after IQ has been removed (:240) is compared with 
the much lower value for the corresponding value for IQ after removal of field 
independence (121). 


Since intelligence test scores and embedded figures scores were each obtained a few 
days prior to teaching, any attenuation of predictive validity over time has occurred 
similarly for both variables. Nevertheless, a demonstration that field independence 
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shares substantial variance with learning and retention after variance attributable to 
the construct of general intelligence has been removed would strengthen the claim of 
field independence to be an educationally relevant dimension of individual difference. 
This possibility has been investigated in a further study (Satterly, 1979). 


DISCUSSION 


The use of advance organizers is based on the assumption that cognitive structure 
is hierarchically organized in terms of more inclusive concepts and that their function 
is to enhance meaningful learning by the incorporation of new ideas. The results of 
the present study support the effectiveness of the organizer in promoting learning and 
retention among pupils with certain learning characteristics but not among others 
and, therefore, provide evidence of the presence of an aptitude-treatment interaction 
of educational relevance. Thus field dependent subjects, whose ability to deal with 
formal structures is limited, are helped by its use but only where the teacher emphasizes 
its properties during the lesson. This finding shows the importance of Ausubel’s 
insistence that meaningful learning takes place with the active intention of the learner 
to incorporate new ideas into his cognitive structure. But, obviously, intention is not 
enough: the learner must himself be aware of the potential of the organizer during 
learning. This suggests, therefore, that the use of organizers with children of this age 
group should be accompanied by a strategy which demonstrates their relevance to the 
new subordinate material. The failure of the organizer to promote learning amongst 
field independent and intermediate pupils and when used in the absence of reference 
to their function (treatment A2) may be explicable in a number of ways. Firstly, 
organizers may be at their most effective with materials above a certain threshold of 
complexity and that the material employed in this study fell below that threshold for 
field independent pupils. Secondly, the organizer may have been successful in its 
intention but the resulting differences too small in magnitude to be detected by the 
tests employed ‘or be of a quality which the tests were unsuited to assess. A third 
possibility, and one which seems most likely, is that the organizers may have been 
effective in the introduction of relevant subsuming ideas but the children have failed 
to exercise the operations of subsumption during learning or the period between 
learning and retention. This last possibility is consistent with the higher mean score 
under treatment A3. 


The apparent failure of the organizer in treatment A2 cannot be taken as positive 
evidence that organizers are of little value in the classroom. Unfortunately, an 
organizer can only be identified in the presence of learning and retention: its value 
cannot be falsified by their absence. However, it seems reasonable to claim that if 
differences are induced by the use of an organizer and are likely to be of greater benefit 
to certain pupils than to others, they remain of less importance than the normal 
differences which occur between children in classrooms. It is clear that the cognitive 
style, as measured by the embedded figures test, is an educationally relevant measure 
of individual differences since differences in style are associated with differences in 
learning and retention. The larger difference in favour of field independent subjects 
in part 2 (transfer) than in part 1 (recall) of the test accords with the argument by 
Witkin et al. (1962) that field independence favours subjects in tasks where greater 
cognitive activity is necessary to acquire meaning than where pupils are merely 
required to reproduce what they have seen or heard, as in part 1 of the test. 


The interaction of style with advance organizer demonstrated by this study 
indicates that in the complex task of processing material which must be made 
meaningful if it is to be transferred and retrieved, considerable gains may be obtained 
where the children are made consciously aware of the superordination-subordination 
process especially where such pupils lack the facility for the articulation of input that 
is characteristic of field dependence. 
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COVARIATION OF COGNITIVE STYLES, INTELLIGENCE 
AND ACHIEVEMENT 


By D. J. SATTERLY 
(School of Education, University of Bristol) 


Summary. A hierarchical factor analysis of the scores of 430 pupils was performed on a matrix 
of correlations between three measures of cognitive style (analytic-synthetic; field independence; 
levelling-sharpening) and achievement in mathematics, geography and English. Factors of 
general ability, field independence and levelling-sharpening were identified. Analysis of a 
reduced correlation matrix suggested that field independence shares a small amount of variance 
with achievement after control for general ability. 


INTRODUCTION 


An investigation was undertaken to examine the covariance of three cognitive styles with 
each other and with the general intelligence construct, personality and school achievement. 
Three tests of attainment (in mathematics, English and geography) were administered to 430 
pupils (aged 10-11 years) and scores on the test and re-test (three weeks later) obtained. Since 
there were no significant differences between scores of boys and girls on any of the tests the 
analysis was carried out on the entire sample. Field independence was assessed using the 
embedded figures test and levelling-sharpening by the test devised by Gardner et al. (1960). 
Three measures of this cognitive style were obtained (percentage increment error, rank order 
and lag score). Analytic style preference test as described by Satterly and Brimer (1971) and 
by Satterly (1976) was used to assess preference for analytic or synthetic styles of thinking, 
and two personality dimensions (introversion-extraversion; neuroticism-stability) were 
measured using the Eysenck Personality Inventory. Measurement of IQ was obtained using 
the NFER verbal reasoning test. 


Inspection of the correlation matrix (Table 1) reveals a uniformly non-significant 
relationship between the measure of analytic synthetic cognitive style and achievement, and 
between personality and achievement. The correlation matrix was factorised using principal 
factor analysis and a hierarchical analysis, similar to that proposed by Humphreys and 
Parsons (1977), was carried out. Three factors were selected for rotation to oblique simple 
structure using biquartimin. A second order factor was extracted from the correlations 
between first order factors and placed into a single order using the Schmid-Leiman trans- 
formation (1957). Results of this analysis are presented in Table 2: 


The general factor is most clearly defined by the tests of achievement and IQ. Correla- 
tions of field independence and levelling-sharpening with achievement are seen in part to be 
attributable to the variance common to the general factor. However, there is considerable 
common variance on Factor II between embedded figures and the achievement tests especially 
in geography and mathematics. Factor III isolates the levelling-sharpening dimension and 
Factor IV the two English tests. 


It will be noted in the above analysis that two variables (analytic-synthetic style and 
extraversion) have particularly low communalities. Their presence may well have served to 
confuse rather than to clarify the factorial structure. Similarly, there is just as much reason, 
psychologically speaking, to expect the two mathematics tests and the two geography tests 
to define separate factors as the two English tests. The factor analysis was, therefore, repeated 
with the elimination of test 1 and 7 but with the inclusion of a composite score for each pair 
of achievement tests in place of the learning and retention scores. The result of this analysis 
on the reduced correlation matrix is presented as Table 3 and leads to slightly different 
conclusions. 


Factor I, as before is the general factor. Factor II is the factor of levelling-sharpening. 
Factor III loads the three attainment tests and embedded figures. Once again highest load- 
ings are obtained by embedded figures, mathematics and geography but there is also a signifi- 
cant loading for the English test. It is tentatively identified as factor of field independence. 
The factor of verbal ability has now disappeared and Factor IV is the relatively small factor 
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TABLE 2 
FACTOR PATTERN OF THE TESTS 




















Factor 
Test I п Ш IV 
1. Preference for analytic-synthetic style – 044 031 -056 -118 
2. Embedded figures (field independence) 390 498 045 013 
3. Levelling-sharpening (rank order) 510 069 850 107 
4. Levelling-sharpening (increment error) 511 074 848 102 
5. Levelling-sharpening (lag score, reflected 
so that high scores indicate sharpening 
tendencies) 288 201 282 072 
6. 1.Q. 507 450 079 183 
7. Extraversion 100 070 092 -017 
8. Neuroticism -138 -131 006 -065 
9. Maths-—test 1 576 605 035 055 
10. English—test 1 476 320 -099 640 
11. Geography—test 1 507 511 050 075 
12. Maths—test 2 602 612 078 052 
13. English—test 2 479 343 -107 605 
14. Geography—test 2 473 508 096 -068 
Note: Factor I = general intelligence 
Factor II = field independence 
Factor Ш —levelling-sharpening 
Factor IV =verbal ability 
TABLE 3 
FACTOR PATTERN OF THE TESTS—REDUCED CORRELATION MATRIX 
Factor 
Test I II ш IV 
1. Embedded figures (field independence) 524 039 297 204 
2. Levelling-sharpening 1 413 899 009 040 
3. Levelling-sharpening 2 410 911 006 041 
4. Levelling-sharpening 3 318 286 100 -029 
5. LQ. 604 109 099 783 
6. Maths (composite) 726 103 341 -086 
7. English (composite) 565  -001 230 173 
8. Geography (composite) 602 090 264. 014 
9. Neuroticism —186 013  -065  -118 





defined by the IQ test and on which embedded figures receives a small but significant loading. 
He findings provide some support for the independence of cognitive style from general 
intelligence. 
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INDIVIDUAL DIFFERENCES IN CHILDREN'S 
PREFERENCES AMONG RECENT PAINTINGS 


By R. BELL 
(University of Western Australia) 


AND G. BELL 
(Victorian College of the Arts) 


Summary. А study of preferences among recent paintings was conducted with 86 Melbourne 
schoolchildren. Although dimensions of preference were found which were similar to those 
identified by children in other studies (i.e., representation, colour, and complexity), differences 
by sex and grade were found only for colour and complexity. In addition, preference for 
representational painting, often found in studies encompassing a range of periods, was not noted 
for these more recent paintings. 


INTRODUCTION 


Very little is known about children’s appreciation of painting, although this plays a 
larger part in their visual experiences than has been played previously by contemporary 
paintings for children. This occurs not only in the specific areas of art education but also 
generally in graphic design areas, such as advertising, which are related to recent movements 
in painting. In fact, very little is known about appreciation of recent painting generally. 
Cameron (1971) reported on a large study carried out in Toronto of paintings from all periods 
but which included some 20th century paintings. The recent non-representational paintings 
were least preferred, confirming the long established belief that there is a lag of about 50 
years in the acceptance of art innovation. However it was recognised that this study did not 
include paintings of the 1960s which provided some quite different trends, particularly with 
respect to the representational/non-representational distinction. 


In an extensive study of children's preferences in painting, Rump and Southgate (1967) 
found that children strongly preferred representational paintings, with the reasons for choice 
changing with age. Younger children (7 years old) based their preference on the object 
represented whereas the older children (11 and 15 years old) were influenced more by com- 
position. Colour was a factor in choice at all age levels though more so for 15-year-olds. 
In a series of studies Gardner and others (1970, 1972, 1975) have looked at age differences in 
children’s sensitivity to art. The emphasis in these studies was on the skill children used in 
evaluating artistic products such as paintings. In general they found differences between 4- 
to 7-year-olds, 8- to 12-year-olds and 14- to 16-year-olds, labelling the response styles 
* immature ’, * intermediate or transitional ', and * mature’ respectively. Both these studies 
and that of Rump and Southgate did not include any recent paintings. A major aim of this 
study was thus to obtain data on children's preferences among recent paintings. Previous 
studies have also assumed that the children were able to verbalise the constructs they were 
using to evaluate the paintings. It was possible that the differences found were differences 
in verbal response rather than basic differences in constructs, and consequently the other 
major thrust of this study was to examine differences among the constructs underlying simple 
preference responses by multi-dimensional scaling. 


METHOD 

Sample 

Twenty-two fourth graders (11 boys, 11 girls; average age 10 years), 22 sixth graders 
(12 boys, 9 girls; average age 12 years), 21 eighth graders (11 boys, 10 girls; average age 14 
years) and 21 tenth graders (10 boys, 11 girls; average age 16 years) were selected from six 
Melbourne metropolitan schools (three primary and three secondary) to be representative of 
schools in the Melbourne area. Six to eight children were selected randomly at each grade 
level within each school for testing. 
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Materials 

Picture postcard size reproductions of the works of a number of 20th century painters 
were assembled and a selection of a representative group of reproductions made biased 
slightly towards more recent works. The selection was submitted to a panel of five art 
experts, and after some adjustment, a final group of 12 reproductions decided upon. The 
titles of each work are given in an appendix. For brevity the works will be referred to by 
the artist's name, i.e. Magritte, Johns, Vasarely, Duchamp, Matisse, Mondrian, Warhol, 
Klee, Hamilton, Stella, Picasso and Pollock. 


The reproductions were mounted in transparent pockets on a blackboard with a letter 
an ouaa below each pocket. The reproductions could thus easily be removed and 
re-ordered, 


Procedure 

Each group of 6-8 subjects were given sheets of paper with the numbers 1-12 printed 
thereon, with a space beside each number. Having ensured that all subjects could see the 
display, we then asked them to select the painting they liked best and write the letter beneath 
that painting beside the number 1 on their sheets of paper. This was repeated for the painting 
they liked second, third, and then children were asked to complete their order of preference 
down to number 12, the painting they liked least. The procedure took up to 15 minutes 
during which time close supervision was maintained to ensure no duplications were made 
and no pictures were missed. After each administration, paintings were re-ordered in a 
random fashion. 


RESULTS 
The matrix of individuals by rankings was analysed by the Eckart-Young decomposition 
- approach of the individual differences model of Tucker and Messick (1963). Four dimensions 
were selected using the mean square ratio approach of Tucker (1968) moderated by an 
informal scree test. 

As indicated by Tucker and Messick (1963), and also Pennell (1972), the first dimension 
contained the average rankings, while the other dimensions indicated the ways in which the 
individual rankings varied from this mean. Table 1 shows the dimension loadings for the 
paintings arranged in order of overall preference (dimension 1 loadings). 


TABLE 1 
Factor LoADINGS FROM RANKED PREFERENCES 











Factors 

Artist 1 2 3 4 

Johns 4:82 2:14 ~ 0:59 — 0:68 
Vasarely 484 0-17 171 2:16 
Pollock 4:88 0:57 -223 0:86 
Stella 5.10 2:03 1:37 0-07 
Picasso 5:40 —0-18 ~ 1:75 0-99 
Magritte 6°24 -1:35 0:93 —1:00 
Matisse 6:95 -0-96 0:07 —0-53 
Klee . 703 1-00 ~ 1-45 -1:05 
Duchamp 734 -1:68 - 0:59 1:72 
Hamilton 8-03 -2:7 0:45 — 0:64 
Mondrian 8:29 1-78 1:07 0:73 
Warhol 8:97 ~ 0:03 0-40 -1:48 





The analysis also gave dimensional loadings for each of the children. Since these 
dimensions were orthogonal (essentially principal components) analyses of variance were 
carried on these loadings for each of the four dimensions with both sex and grade as main 
effects. A summary of the results of these analyses is shown in Table 2. 


Means scores, where significant differences were found, are shown in Table 3. 
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TABLE 2 


SUMMARY OF ANALYSIS OF VARIANCE (F-RATIOS) For LOADINGS BY SEX 
AND GRADE OF CHILDREN 








Dimension Sex F-ratio Grade Sex x Grade 
1 0-00 2:28 1:06 
2 8 61** 6:62** 1:08 
3 5.71* 2:02 0:13 
4 0-84 8:29** 0:29 





ж P«05 ** P<-0l 


TABLE 3 


MEAN VALUES OF LOADINGS ON DIMENSIONS WHERE SIGNIFICANT 
DIFFERENCES WERE FOUND 

















Dimensions 
Grade 1 2 3 4 
4 — 0:09 - 0:85 
6 -- -040 -- -0-27 
8 — -0-13 - -0-24 
10 — 0-76 — -0:32 
Sex 
Boys — 0-36 ~ 0.25 -- 
Girls -- — 0:20 0-26 — 





. Jt could be seen that no significant differences were found for dimension 1. This was 
to be expected since as stated previously, dimension 1 contained overall or average effects. 
Significant variation was found for both sex and grade on dimension 2, significant variation 
for sex on dimension 3, and significant variation for grade on dimension 4. 


DISCUSSION 


The loadings for the first dimension showed that within the context of 20th century 
painting, children's preferences cannot be simply accounted for in terms of preferences for 
representational rather than non-representational approaches to painting. While the less 
representational paintings of Johns, Vasarely, and Pollock were preferred to the more 
representational works of Warhol and Duchamp, children also showed a preference for Stella 
but not for Mondrian. (Interestingly, Mondrian was similarly rejected in the Toronto study 
of Heinrich, 1969.) The two most preferred paintings, the Johns and the Vasarely, although 
both non-representational, could not be considered as particularly similar in terms of the 
range of colours, the application of paint, or their overall compositions. Thus we could not 
relate the other overall criteria of Rump and Southgate (1967), colour, and composition, to 
these general findings. Indeed the differences among paintings preferred (and among those 
not preferred) indicated the multi-dimensional nature of the preferences. 


The second dimension was bipolar in nature with Stella and Johns at the positive extreme 
and Hamilton and Duchamp at the negative end. The contrast between these pairs was 
essentially one of brightness of colours versus more sombre tones, though the Warhol 
example was something of an exception to this. Boys preferred the brighter colours and girls 
the more sombre, and while the Grade 10 children preferred the brighter colours, the Grade 6 
children preferred the more sombre tones. 

Thus while the non-representational/representational distinction was important for all 
children, the most important source of differences among children's preferences related to 
constructs involving colour. This confirmed similar findings of Rump and Southgate (1967), 
and Valentine (1962). 


The third dimension with Vasarely and Stella at the positive extreme and Pollock and 
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Picasso at the negative was identified as differentiating between simple, geometrically well 
ordered pictures and more complex pictures with a less obvious sense of order. No significant 
differences between grades were found, but girls were found to prefer the simple geometric 
paintings, and boys the more complex paintings with less apparent order. 


Taking the results for both dimensions two and three as shown in Figure 1, it can be seen 
that for girls there is a preference for work defined by the paintings of Warhol, Matisse, 
Magritte and Hamilton, and for boys there is preference for the type of work exemplified by 
that of Johns, Klee and Pollock. This could be seen as both a Figurative/Abstract distinction, 
and a distinction between the ‘ flatter’ painting style of the former group and the more 
* painterly ’ style of the latter. 


Previous studies, such as Rump and Southgate (1967) and Munro et al. (reported in 


FIGURE 1 


CONFIGURATION OF PAINTINGS AND GROUP MEAN LOADINGS ON 
DIMENSIONS Two AND THREE 
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Valentine, 1962) have only looked at sex differences with respect to subject matter of repre- 
sentational painting. It would seem that sex differences may extend to the style of the 
painting as shown by the findings for dimensions 2 and 3. 


The fourth dimension was seen as differentiating preference for paintings which were 
abstract and of a precise complexity with respect to the organisation of areas and space. 
Vasarely and Duchamp were examples of this whereas Warhol’s work (Four Campbell’s Soup 
Cans) was the opposite of this with overt content and simple organisation. 


The Grade 4 children responded to this abstraction and complexity whereas the Grade 6, 
8 and 10 children were similarly attracted to the more figurative less visually complex 
paintings. 

This indicates that perhaps younger children do respond to complete attributes of 
paintings such as composition as indicated by Whorley (in Meier, 1933). Findings that 
younger children do not give ‘ composition, technique ' etc. as reasons as in Littlejohns and 
Needham (1933), Rump and Southgate (1967), and Valentine (1962) may have reflected the 
children’s inability to verbalise these constructs. 


CONCLUSIONS 


Children were found to construe 20th century paintings in terms of representation, 
colour, and complexity with respect to preference. These dimensions were similar to those 
found in previous studies. 

However, preference for representational over non-representational paintings, often 
found in studies encompassing a range of periods in painting, was not found for these more 
recent paintings. 

Differences by grade and sex were noted with respect to colour and complexity, and 
further research is indicated to determine how specific these findings are and what implication 
they might have for aesthetic education. 
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APPENDIX 


Paintings used in the study. 
Johns Zero through Nine (1961) 
Vasarely Tridim-T (1969) 
Pollock Number Eight (1952) 
Stella Hyena Stomp (1962) 
Picasso Three Musicians (1921) 
Magritte Victory (1939) 
Matisse — Odalisk with Tambourine (1926) 
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Klee The Puppet Theatre (1923) 

Duchamp The Large Glass (1915-1923) 

Hamilton (A) Together Let Us Explore The Stars (1962) 
Mondrian Composition in Red Yellow and Blue (1921) 
Warhol Four Campbell’s Soup Cans (1965) 
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SEX DIFFERENCES IN CLASSROOM BEHAVIOUR OF INFANTS: 
THE VIEWS OF TEACHERS AND PUPILS 


By D. HARTLEY 
(Department of Education, University of Dundee) 


Summary. Within each classroom in two urban infant schools, views of the behaviour of 
boys and girls were obtained from classmates and the teacher. The behaviour of boys was 
regarded as less appropriate than that of girls, particularly by classmates. In the majority of 
classes there were significant positive correlations between the teacher’s and pupils’ views. 
гер the different social class composition of the two schools, the results in both were 
similar. 


INTRODUCTION 


There is now a growing body of evidence which shows that teachers rate the classroom 
behaviour of boys as being less appropriate than that of girls (Davidson and Lang, 1960; 
Douglas, 1964; Kellmer-Pringle, 1966; Davie et al., 1972; Brandis and Bernstein, 1974; 
Ingleby and Cooper, 1974; and Stevenson et al., 1976). The views of the pupils themselves 
have been curiously absent, perhaps because of the difficulty of conducting research with very 
young children who are unable to complete questionnaries and rating scales in written form. 
The evidence so far suggests that pupils perceive that their teachers show more approval of 
the behaviour of boys than of girls (Davis and Slobodian, 1967; Meyer and Thompson, 1956). 
The present study focuses on the individual classroom and takes account of both teachers’ 
and pupils’ views of the classroom behaviour of boys and girls in two large infant schools. 


The two schools were situated in an urban setting in the south-west of England. School 
А was a Social Priority School located in a large pre-war council estate and virtually all the 
pupils had fathers whose occupations were classified as manual. Except for one man, the 
14 teachers were women, most of whom were in their twenties. School B’s catchment area 
was contiguous to that of School A but the social characteristics of the area were different. 
Householders were owner-occupiers and the area had a recent history of voting Conservative. 
Of the 15 teachers, two were men. The regime of the school was stricter and more traditional 
than in School A and the staff were some 10 to 15 years older than the teachers in the other 
atl The size of the enrolment in both schools was similar: 397 in School A, 383 in 
School B. 


The broad aim of the study was to compare teachers’ and pupils’ views of the classroom 
behaviour of boys and girls. Whilst the major emphasis was on sex differences, the nature 
of the problem was the comparison of teachers’ and pupils’ ratings of behaviour. 


: METHODS 

The teachers’ view 

Each teacher rated every pupil in her own class against five bipolar scales. These 
scales were arranged in the form of a seven-point, Osgood-type semantic differential. The 
scales were: gentle-rough; tidy-untidy; noisy-quiet; immature-mature; and able-unable 
to concentrate. These scales were themselves derived from a content analysis of interview 
transcripts. The interviews were held with each teacher who was asked to state any differ- 
ences she saw between boys and girls as pupils in her school. 


The pupils’ view 

In order to be able to compare the teachers’ ratings with the views of the pupils, it was 
necessary to preserve the 10 behavioural categories which constituted the five rating scales 
just mentioned. However, since few of these very young children could have completed a 
rating scale in written form, another way of finding out how the pupils saw the classroom 
behaviour of boys and girls was sought. The use of the ‘Guess Who’ test, a sociometric 
device which could be administered orally, was chosen. Each pupil is individually asked 10 
questions, each question referring to one of the 10 behavioural categories stated above. 
Thus: “ Can you tell me the name of anyone in your class who is often NOISY?" The 
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pupil then nominates those pupils who fit that description. There is no limit to the number 
of names which can be listed. The second question: “ Can you tell me the name of anyone 
in your class who is often QUIET?” And so it goes until the pupil has been asked all 10 
questions. Scores are assigned on the basis of the number of nominations received by each 
pupil on each of the 10 behavioural categories. Thus each pupil has 10 scores, one for each 
category. These scales cannot be assumed to be more than ordinal scales, producing rank 
orders. 


As terms like * immature ' could not be used with young children before the children 
were interviewed, their respective teachers were observed in the classroom to determine the 
words they used for the 10 behavioural categories. Thus ‘immature’ may become ‘ acting 
silly’ or ‘ behaving like a baby’. ‘Mature’ may become ‘sensible’ or * grown up’. 
* Concentration ' was often referred to as ‘ getting on with work nicely '. The children were 
never interviewed in their classroom, but in the corridor where there would be less chance of 
‘prompting’ from classmates, and where the surroundings would not be strange to the 
pupil. It should be noted that the children had seen the researcher before; he was not a 
total stranger. 


Comparing the views held by teachers and pupils 
The scores received by pupils on the teacher's ratings and the pupils’ nominations were 
ranked as follows. For the teacher's ratings, each pupil was rated against each of five scales: 
for example, 
noisy _ x quiet 
i 


2 3 4 5 бт 


The choice marked here could be interpreted in two ways: the higher the score, the more 
* quiet '; or the lower the score, the more ‘ noisy’. This would give а“ quiet ' score of 6 and 
a ‘noisy’ score of 2. In this way, each pupil could be assigned a score on each of the 10 
behavioural categories mentioned in the five bipolar scales. Each pupil in the class could 
then be given a rank for each category. Similarly, each pupil in the class was assigned a 
rank score on the basis of the number of nominations received from classmates for a particu- 
lar category; the higher the number the higher the rank. Then the two sets of ranks were 
correlated for each behavioural category, using Spearman's rank-order correlation coefficient. 


RESULTS 

The teachers? view 

At the classroom level of analysis, Table 1 shows the extent of the predicted sex differenti- 
ation made by teachers in both schools. The number of significant results is not large, 
particularly in the classes for the younger children. The overall rating trend, however, 
shows boys to be rated at the more ‘ unfavourable’ extreme of each scale, although the 
opposite occurs in some reception classes. If one compares the results for the two schools, 
the rating patterns are very similar, although there are slightly more significant differences 


TABLE 1 


MANN WnurrNEY U Test: Sex DIFFERENCES IN INDIVIDUAL TEACHER RATINGS 
OF PUPILS IN EACH CLASS 





Teachers rating boys differently from girls 
« 0-05) 








(P 
School A School B 
Scale N (%) N (%) 
able/unable to concentrate 2 (14-3) 1 (6-7) 
immature/mature 4 (28:6) 3 (20-0) 
tidy/untidy 3 (21:4) 5 (33:3) 
gentle/rough 6 (42-9) 7 (46-7) 
noisy/quiet 5 (35-7) 5 (33:3) 





Notes: 1. Teachers in School А (N=14). Teachers in School B (N= 15). 
2. All results are in the ‘ expected ° direction—girls rated more favour- 
ably than boys. 
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reported for School A, especially on the noisy-quiet and immature-mature scales; boys 
being rated the less favourably. 


The pupils’ view 

The Guess Who? nominations made by the pupils of their classmates’ behaviour reveal 
a pattern which accords with that of the teachers’ ratings (Table 2). In both schools less sex 
differentiation is made in classes for the younger children. Boys are nominated more often 
for revealing ‘ negative ° forms of behaviour (rough, untidy, immature, noisy and unable to 
concentrate) whilst the girls receive more nominations for displaying the ‘ positive ’ opposites 
of these types of behaviour. Again, as with the teachers’ ratings, boys are consistently 
nominated as being significantly rougher than girls. The degree of sex differentiation is 
again slightly higher in School A, the working class area school. 


TABLE 2 


Mann WnrrNEY U Test: Sex DIFFERENCES IN PUPILS’ Guess Wao? 
Test NOMINATIONS 


‚ Number of classes in which pupils 
nominated boys differently from 





girls (P « 0-05) 
School A School B 

Behaviour N (92) N (9 

Gentle 9 (58:3) и (78-6) 
Rough П (91-7) 12 (85-7) 
Concentrates 4 (33:3) 1 (7-1) 
Lacks concentration 6 (50-0) 3 (21:4) 
Noisy 7 (58-3) 5 (35-7) 
Quiet 4 (33-3) 6 (42:9) 
Immature 6 (50-0) 6 (42:9) 
Mature 5 (41-7) 2 (14:3) 
Tidy 4 (33:3) 7 (50-0) 
Untidy 4 (33:3) 4 (28-6) 





Notes: 1. Classes in School A (N12). Classes in School B (N14). 
One teacher from each school refused to allow her pupils to 
undertake the Guess Who? test, explaining that it would be 
tantamount to ‘ telling tales °. 

2. All results in the ‘expected’ direction—girls nominated more 
frequently on the * positive ’ categories, boys on the ‘ negative ° 
categories. 


Comparison of the teacher's ratings and the pupils’ nominations 

Table 3 shows the existence of a significant positive correlation between the teachers’ 
ratings and the pupils’ nominations of the classroom behaviour of individual pupils. The 
tendency is for these to occur more in the classes for the older pupils. The results further 
suggest that this correlation is greater for those types of behaviour which are readily observ- 
able—noisiness, roughness, untidiness and not concentrating. And it must be suspected 
that it is these types of bebaviour which the teacher may criticise in the classroom as being 
less appropriate. The lowest and negative correlations obtain for immature and mature 
types of behaviour. This perhaps is not surprising, for these are more evaluative terms 
rather than types of behaviour. It must be suspected that teachers may privately retain 
these evaluations rather than publicise them in the classroom. 


DISCUSSION 


The results for both schools reveal a number of similarities, Firstly, the overall trend 
is for boys to be rated and nominated as behaving more ‘ inappropriately ' than girls. It is 
reported elsewhere (Hartley, 1977) that the teachers in both schools rated quiet, gentle, tidy, 
mature and concentrating behaviour as being associated more with the concept of the 
SUCCESSFUL FIRST SCHOOL PUPIL than their semantic opposites. 
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TABLE 3 


SPEARMAN RANK CORRELATION COEFFICIENT: TEACHERS’ RATINGS AND PUPIL 
Guess WHO? Test NOMINATIONS OF INDIVIDUAL PUPILS IN EACH CLASS 


Classes in School A (N=12): Classes in School B (N= 14): 
no. of no. of 
significant ы significant 
T ve ++ ve 
Behavioural correlations correlations 
category median r, range (P<0-05) | median r, range (P«0-05) 
Gentle 407 030/728 6 290 --138/573 7 
Rough 529 158/616 9 393 116/719 9 
Cor oara 475 155/651 .9 423 106/614 10 
Lac 
concentration 320 — 062/744 6 368 057/741 9 
Noisy 448 287/705 1 533 190/711 13 
Оше: 311 --083/637 6 374 088/551 9 
Immature 371 — 307/881 7 142 — 609/590 6 
Mature 482 — 443/702 8 321 — 704/600 7 
Tidy 477 — 430/896 10 458 176/586 12 
Untidy 432 — 215/725 8 457 — 032/766 i1 


Note: decimal places omitted. 


Secondly, the Guess Who? test nominations by the pupils are generally more significant 
than are the teachers’ ratings. It was previously mentioned that the pupils’ nominations are 
in the form of frequency counts, and that a pupil who received, say, eight nominations for 
being ‘ rough’ could not be said to be eight times rougher than a pupil who was nominated 
only once. Thus the pupils’ nominations could have exaggerated the degree to which a 
pupil usually revealed a particular form of classroom behaviour. 


p Thirdly, the results in the classes for the youngest children reveal less sex differentiation 
than do those in the classes for the third-year pupils. This may partly be explained by the 
possibility that teachers of the youngest children may choose to ‘ ignore’ pupil misdemean- 
ours, especially those of boys, lest they provoke antagonism on the part of pupils towards 
school at this age. As the pupil gets older, and ‘ should know better ’, these misconducts, 
which previously went ignored, may now be called into question and corrected. It would 
appear that girls have the greater willingness to accept the teacher’s correction. 


A fourth similarity is that in both schools there is for most teachers a positive correla- 
tion between the teacher’s ratings and the pupils’ nominations of their classmates’ behaviour. 
This prompts the very difficult question: are the pupils accepting the teacher’s definition of 
the classroom behaviour of groups of individual boys and girls and then stating them as 
though they were their own? The direction and causality of this relationship may only be 
imputed, not ascertained from the results reported here. To assume that it was the defini- 
tions of the teachers which caused the pupils to nominate the behaviour of other pupils in a 
similar way is to ignore a number of other possibilities. Pupils could have formed their 
own definitions using knowledge obtained from, say, the playground, the local area or, 
perhaps most probably, the previous classroom which the pupils were in. Just as it may be 
the case that teachers accept and sustain pupil identities from year to year through staffroom 
discourse and pupil record cards, so too might the pupils preserve earlier categorisations of 
their fellow pupils. The results show that it is in the third-year classes that the highest 
number of significant correlations are found. This trend may be explained as the outcome 
of the accumulated shared definitions of particular pupils by their fellow pupils and their 
teacher. у 


A further proviso should be made. The correlational analysis compares ranked data, 
but the raw scores upon which these ranks are based are the results of different forms of 
measurement. Whilst each teacher rated each pupil on a seven-point scale on all behavioural 
dimensions, pupils chose names and even then not all pupils were nominated on all types of 
behaviour. But this is unlikely to have affected the direction of relationship, and the relative 
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values of correlations between scales can still be interpreted. On the basis of the findings 
here, it would still be reasonable to conclude that most teachers and pupils see the classroom 
behaviour of boys less favourably than that of girls. 


The results indicate a number of differences between the two schools. The degree of sex 
differentiation is greater in School A. It has been found that in working class areas, parents 
tend to differentiate more between their sons and daughters than is the case in middle class 
areas (Newson and Newson, 1968). The greater sex difference in pupil behaviour in School 
A may be a consequence of this, but the between-school sex differentiation is only slight. It 
was expected that less sex differentiation would obtain in the more middle class School B. 
This expectation itself rests on Bernstein’s (1975) suspicion that ‘ new ’ middle class parents, 
with their ‘invisible’ pedagogy, seek to minimise sex differentiation when rearing their 
children, and that this may influence pedagogical practices in infant schools. The findings in 
School B do not appear to support Bernstein’s intuition. That they do not may be explained 
by statistical reasons. In the results here, the statistical correspondence between the two 
schools does not necessarily mean that the two staffs, faced with the same pupils, would rate 
their behaviour in the same way. It has been said that School B was the stricter school and 
it is to be suspected that the teachers in that school had lower thresholds for certain types of 
inappropriate behaviour than their School A counterparts. Relative to the teachers in 
School A, those in School B might have ‘ exaggerated’ the ‘inappropriate’ behaviour of 
pupils, especially of boys, when they came to rate them. An example of this is provided by a 
teacher from School A who visited School B for a morning session. She was interviewed 
later in the day: 


Teacher: Of course the Headmistress was very keen on it anyway (that pupils line up in 
silence). She told somebody off when I was there and he wasn't even doing 
anything that would have been remarked at at our school. 


DH Did you see any other differences in behaviour that you would call one thing 
and School B teachers would call another? 


Teacher: Well one did. One teacher was talking to me about problem children and she 
said—you know . . . she had got several children in her class that she would 
like to put in a little alcove of their own because they were such a nuisance, 
but from my brief visit in her room, which of course is a bit false, but I didn't 
see any children in there that were in the least a nuisance, not in the least. 


These exploratory studies could be extended to consider the commonsense definitions 
held by the pupils themselves of boys and girls as pupils. In particular, they could find out 
whether boys see boys as pupils any differently than girls see boys as pupils; and similarly 
for girls being seen as pupils. Until such studies are carried out, the findings here should be 
treated with caution. 
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ATTENTION AND ACTIVITY IN THE YOUNG CHILD 


By S. TYLER, HELEN FOY Амр CORINNE HUTT 
(Department of Psychology, University of Keele) 


Summary. The attentive capacities of young children were studied in terms of spans of activity 
and of attention. The first study examined activity spans in four types of pre-school establish- 
ment and the second study examined attention in the infant school. In both pre-school and 
infant school the influence of the adult in focusing the children’s attention was demonstrated. 


INTRODUCTION 


Historically, the subject of attention has received much consideration in psychology. 
Its importance in learning processes was noted by James (1902) who wrote: “ Whether the 
attention comes by grace of genius or by dint of will, the longer one does attend to a topic 
the more mastery of it one has." Since that time, psychological studies of attention have 
attempted to distinguish orientation and attention, or between those stimuli which elicit 
attention and those which maintain it. Recently, in their review Rosenthal and Allen (1978) 
suggest that cognitive theorists usually separate attention into at least two factors: an 
intensive factor called alertness and a selective factor which refers to the choice of certain 
stimuli for further processing. But, of greater concern to those involved in the education of 
the young child, is the question of how long, once the stimulus or activity has been selected, 
attention to it can be maintained. Standing, commenting on the work of Maria Montessori, 
writes: 

* One day .. . [the child] . . will choose some occupation (very likely one he has trifled 

with many times before) and settle down seriously to work at it with the first spontaneous 

spell of concentration that he has ever shown. This is the beginning of his salvation... 

Concentration is the key that opens up to the child the latent treasures within him." 

(Standing, 1957, p. 152.) 


A prima facie case may be made that the longer a child is able to attend the more efficient 
his information-processing and hence the better his performance. Such a case is supported 
by empirical evidence that attentiveness is correlated with school achievement and task 
performance in 11-year-olds (Lahaderne, 1968), 9-year-olds (Cobb, 1972) and 6-year-olds 
(Samuels and Turnure, 1974). Attention also appears to increase with age (van Alstyne, 
1932; Gutteridge, 1935; Crow and Crow, 1963; Mussen et al., 1974) and hence longer 
attention spans may be considered a sign of greater maturity, although other studies have 
failed to find such a relationship (e.g. Clark et al., 1969; Lunzer, 1958; Tizard, 1976). 
Furthermore, the main difficulty that brain-damaged and hyperkinetic children have lies in 
their inability to concentrate (Laufer et al., 1957; Hutt and Hutt, 1964) and their distractibility 
is a major source of disruption in the classroom. 


Thus, both educationally and socially, longer attention spans seem to be desirable. In 
view of the assertion by Lahey and Johnson (1978) that ** pre-schoolers are easily distracted 
and have difficulty focusing their attention on any one activity for a long period of time, . . .” 
(p. 55) we decided to examine the attentiveness of pre-school children in order to see whether 
certain contexts were more conducive to concentration than others and also to evaluate the 
role of adults in focusing children's attention. 


MEASURES OF ATTENTION 


Behaviour may be divided into events in which the occurrence of the behaviour is of 
principal importance, and states in which temporal duration is the critical feature (Altmann, 
1974). Measurement of attention or activity spans is dependent upon a behavioural classifi- 
cation in terms of states. Within this system the behaviour may be categorised by its 
morphology or function on the one hand, or by the stimulus encountered on the other. 
ipe n m are identified with the former, spans of attention with the latter (Hutt and 

utt, 1970). 
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Attention span is defined as the duration of continuous engagement with the same 
stimulus. Generally attention spans are of short duration, and most easily monitored with 
reference to the visual modality of the subject. Activity span is defined in terms of the duration 
of the behavioural categories employed, whether the behaviour is classified in terms of its 
morphology (e.g. running, hitting) or its function (e.g. sand play, water play). Activity spans 
are generally longer than attention spans. 


Study I: Activity spans of pre-school children 

Children were studied in four types of pre-school establishment: nursery school, nursery 
class, day nursery and playgroup. Nursery schools and classes were considered separately 
since in the local authority that managed these establishments it had been customary practice 
to staff nursery classes with nursery assistants. Thus, in the classes we studied the teacher-in- 
charge was the head of the infant/junior school. 


METHOD 


Sample: Forty-eight children, 6 boys and 6 girls from each type of establishment, were 
selected at random from within the age range 3 years 9 months to 4 years 6 months. 


Procedure: Children were observed during periods of free play in the morning. Each 
child was observed for two separate periods of 30 minutes each. Behaviour was time-sampled 
at intervals of 15 seconds using a convention of predominant-activity sampling (Tyler, 1979), 
and recorded on a checklist using pre-selected categories. For the analysis, the number of 
consecutive cells spent in any one activity was noted. If during an activity there was a 
* break’ of more than one time-interval (i.e. child talking or looking elsewhere), it was 
considered that one span had ended before, and another begun after, the break. Presence of 
adults was also noted. Mean activity spans in time-intervals were calculated for each group 
of children and then converted into seconds. 


RESULTS 

The mean activity spans obtained in the different establishments is shown in Table 1. 

It may be seen that the overall span is greatest in the school and lowest in the day nursery. 
TABLE 1 


MEAN ACTIVITY SPANS OF CHILDREN IN FREE PLAY IN DIFFERENT FORMS 
OF PRE-SCHOOL PROVISION 


Pre-school Without Adult With Adult % Increase Overall 

Provision (seconds) (seconds) with adult (seconds) 
Nursery School 124-4 186-0 49-5 1308 
Nursery Class 126-2 161-4 27-9 126:8 
Playgroup 107-3 195-9 82-6 1142 
Day Nursery 98-3 209-9 113-5 110-7 


When, however, account is taken of the presence or absence of an adult, it may be seen that 
the spans are considerably increased in the former condition. (ANOVA: Е =27-12, df =1,69, 
P<0-001.) 


Study П: Attention spans in reception classes 

It is often argued that pre-school experience enables a child to settle in to a school 
regime more easily than if he had not had that experience. It may be that a child at nursery 
school learns to concentrate in spite of other activities going on around him. In fact, some 
evidence supporting this supposition was obtained in a study of children in their homes, who 
were found to be more distractible than their peers in nursery schools (Davie et al., 1975). 
We examined the hypothesis therefore, that the nature of pre-school experience has some 
effect on children's ability to attend when they commence primary school. 


METHOD 


Sample: Forty-nine children from the reception classes of six primary schools were 
observed during their second week of school; 26 children (14 boys and 12 girls) had previously 
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attended a nursery school or class and 23 (11 boys and 12 girls) had been at home or had 
attended a playgroup for two sessions a week. 


Procedure: Each child was observed during the second week of term on two occasions, 
each lasting 15 minutes. Observations were made in the morning during normal school 
activity when the child's behaviour was usually task-orientated and excluded breaks for milk 
andlunch. Two similar observations of each child were made again in the three weeks prior 
to the end of term. Since there is less physical activity and movement in reception classes 
than in nursery schools, it was possible to monitor attention more closely than in Study I, 
by use of a portable event-recorder. The child's behaviour was monitored by an observer 
using a 4-channel event recorder, each channel of which was used to record a different aspect 
of behaviour as follows: 


Channel 1: attention to task or activity—recorded duration of the child's orientation to 
a task; 


Channel 2: look/watch—recorded periods when the child looked elsewhere (e.g. at 
picture or wall), or watched other children; 


Channel 3: peer interaction—recorded any encounter with another child; 


Channel 4: adult participation—recorded child's attention to adult's speech or to activity 
in which adult was also involved. Channels 3 and 4 could record simultaneously with 
Channel 1, or independently. Absence of record on all four channels during the session 
denoted that the child was “ looking around, with no point of focus ". For all channels the 
child's visual orientation was taken as determining the focus of attention. After the observa- 
tions every entry on each channel of a recording was measured and the distances translated 
into time. 


RESULTS 


Attention spans were longest when children were involved in tasks or activities (Channel 
1; mean =15-9 seconds) as opposed to looking and watching (Channel 2; mean =7-7 
seconds) or interacting with other children (Channel 3; mean =6:9 seconds). Simultaneous 
analysis of channels 1 and 4 yielded the durations of attention when an adult was present 
with the child at a task and the spans achieved when the adult was absent. The mean 
attention spans for attention to task at the beginning and at the end of term, with and without 
adult are given in Table 2. Analyses of variance revealed that the influence of the adult was 
a significant factor (F =8-89, df 1, 43 P<0-005) while the time spent at the first school was 
a factor that approached significance (Е =5-2, df 1, 43, P<0-075). Neither sex nor type of 
pre-school experience proved to be significant. 


TABLE 2 


ATTENTION SPANS (IN SECONDS) FOR ACTIVITIES IN THE RECEPTION CLASS 
ACCORDING TO PRE-SCHOOL EXPERIENCE 





Start of Term End of Term 
Pre-school experience Alone With Adult Alone With Adult 
Girls 161 20:2 17:4 244 
Nursery Boys 11:0 22:8 18-0 17:2 
Both 13-5 21:5 177 207 
Girls 16:5 13-8 20-5 24.5 
Ноше : Boys 106 140 172 18-7 
Both 13-0 13-9 18-8 31-6 


Analysis of Channel 3 revealed that periods of peer interaction wexe shorter at the 
beginning of term (mean =6-2 seconds) than at the end (mean -- 7:6 seconds), but the difference 
fell short of significance. However, in terms of frequency of interaction rather than duration, 
the children with nursery experience had significantly higher scores, both at the beginning 
and end of term. (ANOVA: Е =5-2, df=1,44, Р-<0-05.) 
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DISCUSSION 


The first study showed that the ability of young children to concentrate upon an activity 
was dependent, at Jeast in part, upon the nature of their environment. The educational 
establishments which cater for a narrower age-range of children and are also more system- 
atically organised provided the least distracting environments for the children. In playgroups 
where attendance is on a sessional basis and adults and environment are less familiar, and in 
day nurseries where there are more very young children, 4-year-olds clearly have more 
difficulty in sustaining their attention. The effect of the adult however, in harnessing and 
focusing the children’s attention is dramatic and it seems that, proportionally, this effect is 
greater the more attention is previously fragmented. Thus the greatest influence of the adult 
is observed in the day nurseries. 


In the second study too the most important factor appeared to be the involvement of the 
adult in focusing the child’s attention. Surprisingly, in view of claims previously made, the 
nature of pre-school experience had little effect on the child’s ability to attend. 


Both these studies have emphasised the significance of the adult in potentiating the 
attentional capacities of the young child. Learning cannot take place unless attention is paid 
to the relevant stimuli. Thus, the adult plays a fundamental role in enabling the child to 
deploy its attention most effectively. In view of this, the traditional practice of leaving 
children to learn through their own efforts seems a questionable one. 


. ACKNOWLEDGMENTS—Our thanks are due to the many nurseries and infant schools who par- 
ticipated in the studies described, and to the DES whose generous funding made the research possible. 
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DEALING WITH SURVEY DATA 


By MURRAY AITKIN 
(Centre for Applied Statistics, University of Lancaster) 


O'MUIRCHEARTAIGH, C. A., and Payne, C. (Eds.) (1977). The Analysis of Survey 
Data. Volume 1: Exploring Data Structures. Volume 2: Model Fitting. 
Chichester: John Wiley, pp. xvi + 273, xvi + 255, £20:50 (set). 


This two-volume set is an important and ambitious undertaking. In 15 chapters, the 
editors have attempted a comprehensive coverage of the theoretical and practical problems 
of survey analysis, from the construction of attitude scales and the examination of sources 
of response error, through the preparation and processing of survey data, to ‘ exploratory ° 
methods of analysis (cluster and factor analysis, multidimensional scaling and AID) and 
formal statistical analysis procedures based on the general linear model, including path 
analysis and the log-linear model for contingency tables, and stochastic models. The 
important question of the effect of the survey design on the proper form of data analysis is 
also discussed. 


With some exceptions, the editors and contributors succeed in presenting a detailed 
* state-of-the-art ° coverage of the major areas of survey analysis. Unfortunately, the state 
of the art is unsatisfactory in some important respects, notably in the statistical basis of those 
* exploratory ' techniques discussed in the first volume. Detailed comments are made below 
on each chapter, and some general conclusions drawn at the end of this review. 


With 15 chapters by 13 authors, some unevenness is inevitable. The level of statistical 
sophistication assumed of the reader varies considerably. More troubling (at least to a 
statistician) is the division of the work into the two volumes. To quote the editors' preface: 
* On the one hand those who are dealing with relatively uncharted substantive areas will 
find that the techniques presented in Volume 1 are more immediately relevant as these deal 
with the exploration and identification of structures in the data. On the other hand the 
classical statistician or the social scientist who is concerned with hypothesis testing will find 
that the model fitting and testing procedures described in Volume 2 are closer to his needs...” 
This division is troubling because many of the ‘ exploratory ° techniques of Volume 1 in fact 
use implicit statistical models, but the goodness-of-fit or appropriateness questions for such 
models are almost completely ignored. 


Volume 1 begins with an overview by O'Muircheartaigh. In a concise chapter of 39 
pages, he describes all the stages of a survey: the nature of variables, methods of data collec- 
tion including a discussion of non-response, sample design including estimation from 
clustered samples, and methods of analysis including exploratory and model fitting methods. 
Chapter 2 by Payne is a clear practical guide to the preparation and processing of survey 
data, covering data file preparation and checking, and general problems of computer 
analysis. 


Chapters 3, 4 and 5, on cluster analysis (Everitt), principal component and factor analysis 
(Taylor) and latent structure models (Fielding) describe widely used * exploratory ' methods. 
The basic difficulty with cluster analysis is stated at the beginning: '* A fundamental problem 
in this area is the lack of a satisfactory definition of exactly what constitutes a cluster. 
Because of this, most clustering techniques cannot be formulated in terms of a satisfactory 
model, as сап... factor analysis. . . . However, since the main aim of this chapter is 
merely to acquaint readers with some particular clustering techniques and their associated 
problems, the matter of definition will conveniently be ignored, and the term cluster or 
group used in an intuitive sense for a collection of ‘ similar’ individuals or objects. . .. 
[Most] cluster analysis methods are essentially non-statistical in the sense that they have no 
associated distribution theory or significant tests, and so are unable to relate from sample to 
population. Indeed many clustering techniques treat the data at hand as the population and 
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so there is no question of making inferences as to the underlying structure in some hypo- 
thetical population from which the data was drawn.” 


This clearly raises serious difficulties. How do we know that a particular configuration 
of clusters produced by a numerical algorithm would also have been produced by a different 
random sample from the same population or by a different algorithm on the same sample? 
What confidence can be placed in the existence of real clusters? Everitt gives an example in 
which data consisting of two well-separated elliptical groups are clustered into two spherical 
clusters, each containing about half the observations from each group. The clustering 
method used is the widely-used method which minimises the within-cluster sum of squares. 


The only methods of cluster analysis which allow formal statistical tests for the actual 
existence of clusters (against the null hypothesis of a single homogeneous population) are 
those based on mixture models. Everitt describes Wolfe’s method which assumes that the 
density of the observations is a mixture in unknown proportions of multivariate normal 
densities with different means and covariance matrices: 


k 
f(x) = Zaf » Pss >) 
s= . 


k 
where У, 1,—1. The parameters of the model are estimated by maximum likelihood, and a 


5-І 
sequence of likelihood ratio tests is available for the number of clusters. It is noteworthy 
that the only numerical example Everitt gives uses Wolfe’s method, though the likelihood 
ratio tests are not reported, the three cluster solution being discussed as it “ proved the 
easiest to interpret”. The considerable computer time required for parameter estimation 
can be substantially reduced if the covariance matrices are equal. 


Chapter 4 gives a conventional treatment of principal component and factor adis 
The role of the factor model as a statistical model is played down: **...there is an attempt 
to avoid embedding the description of the techniques in the framework of a multivariate 
normal distribution. Although this approach would allow a close link-up with classical 
statistical theory it would not be so readily applicable to practical data analysis. Most data 
are not multinormally distributed. ..." But non-normality may have a serious effect on the 
possible maximum sizes of correlations between variables, and more importantly, if the 
observed variables are dichotomies then conventional factor analysis is quite inappropriate 
(other latent variable models exist for this situation). The goodness-of-fit test for the number 
of factors is critically dependent on the assumption of multinormality, and an examination 
of the distributions of response variables should precede any factor analysis. 


The section on factor score estimation is confusing. 'The vague references to indeter- 
minacy of factor scores might be (undeservedly) alarming to those unfamiliar with the 
psychometric controversy, but it is not made clear why factor score estimates are necessary 
at all. The use of factor score estimates as predictors of other variables is a dangerous 
practice: if an observable * response ' variable is believed to be related to observed * predictor ' 
variables, through their common regressions on unobservable factor variables, then a 
structural relations model should be fitted. 


The section on principal component analysis is straightforward, but the applications 
subsection is potentially misleading: “Ву showing which linear functions of the variables 
can be ignored (namely . . . the smaller components) it can be used to reduce the data set to 
a few major summary items. ... The use of PCA in regression techniques is common." 
Serious difficulties can occur if predictor variables in a regression are converted to principal 
variables. If the principal variables with small eigenvalues are dropped from the regression, 
a substantial loss in prediction may occur (this happens in the quoted example by Jolliffe). 
There is no necessary relation between the size of the eigenvalue and the correlation of the 
principal variable with the response. Even if the principal variables are ordered by their 

correlation with the response, many more principal variables than original variables may be 
necessary in the regression to obtain the same R2 (this also happens in the Jolliffe example). 
No ‘ reduction ' of the data set occurs in such cases. 


Chapter 5 gives a clear formulation of the latent structure model. A simple case, which 
arises frequently in practice, is the multiway contingency table formed by the cross-classifi- 
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cation of survey data. In fitting log-linear models to such tables, it is sometimes found that 
inexplicable high-order interactions appear. The latent structure model represents the 
observed table as having arisen by the collapse of a higher-dimensional table over a latent 
or unobserved classification, in which a much simpler structure (usually conditional inde- 
pendence of the observed classifications given the latent classification) holds. 


The general representation for all latent variable models is given in §5.2. Suppose 
х,..., Xp are observed variables, which can be at any level of measurement, апа 
у,..., Ym are latent variables, also at any level of measurement. Conditional on the 
values of the latent variables, the observed variables have a joint probability function 


PX . . . У . - Уһ) While the marginal distribution of the latent variables is 
h(y1, .. . , Ym), and the marginal distribution of the observed variables is 

F(X, ooo Жо) [09 o o . › ХьУь + + + › Yd o + > + › Ysddys . > © буа. 
Only the marginal distribution of xi, . . . , Xp is observable, and we want to estimate the 


parameters of the conditional distribution of x given у, and the values of yj . . . , ya. 


The difficulty here is that all the information about the conditional distribution, and 
about Yn .. . , Ym, has to be obtained from f(x, . . . Xp), and without some strong structural 
assumption, this will not be possible. In the factor analysis model, both the x, and the y, 
are assumed to be normally distributed, and the structural assumption is that of conditional 
independence: given the y;, the x, are independent in the conditional distribution of x given 
y. In the latent class model, both the х, and the у, are polytomous, and a similar conditional 
independence structure is assumed. The structural assumption allows the parameters in the 
conditional distribution to be estimated. 


À very wide range of latent variable models can be constructed by suitable choice of the 
form of the у; and x; It is unfortunate that the estimation of parameters by maximum 
likelihood is almost dismissed in 85.4.3: “ A maximum likelihood approach to be estimation 
of parameters in other latent structure models is possible in principle but this would require 
assumptions about the forms of the distribution of the latent variables and the trace function 
[d(xly)].... It is likely, however, that such procedures would give rise to great complexity 
and impracticability.” 


But without a distributional assumption (e.g. that the factors are normally distributed 
in a factor model), we have neither any optimal method for estimating the parameters of the 
model, nor any goodness-of-fit test for the appropriateness of the model. 


The * accounting equation ’ methods originally developed for the latent class model do 
not provide optimal parameter estimates, and indeed can produce a whole series of different 
consistent estimates. Some insight into the distribution of the latent variable can be 
obtained from the implied marginal distribution of the observable variables: for example 
in a two-group cluster model in which the observed density is a mixture of two multivariate 
normal distributions, a probability plot of the observed data can be compared with that 
implied by the mixture model, once the parameters have been estimated. 


Chapter 6, by Coxon and Jones, is on multidimensional scaling. There is a clear and 
detailed, though non-technical, discussion of non-metric multidimensional scaling, and a 
long discussion of the application of INDSCAL to a sociological example. Goodness-of-fit 
questions are barely mentioned, and neither are sampling questions, though the authors 
do give a ‘ word of warning’: “Тһе sampling properties of almost all MDS models and 
procedures are little known, although a start has been made on the problem . . . MDS 
procedures should therefore be used with the caution one reserves for any very powerful 
but unfamiliar procedure." 


The example illustrating the use of INDSCAL has six a priori distinguishable groups, 
with an average of about 160 individuals in each group. The data consist of responses on a 
21-item inventory. A two-dimensional latent space is identified, the mean scores for each 
group plotted in the space, and the ‘loadings ' of each item on the latent dimensions also 
plotted. The further information provided by INDSCAL which is not available from a 
factor model is a set of ‘shrinkage’ factors for each group, which represent the varying 
importance of the latent dimensions to each group. 


-. The absence of a statistical model for INDSCAL (as for other MDS methods) means 
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that no formal tests are available for the number of dimensions, the significance of * loadings ' 
of items on each dimension, the significance of group mean differences in the latent space, or 
the significance of the individual group shrinkage factors. 


Chapter 7, by McKennell, gives a detailed discussion of attitude scale construction in an 
exploratory attitude survey. There is much good advice on pilot surveys and the selection 
of items for scales, and a non-technical discussion of Guttman, Likert and Thurstone scaling. 
Cluster and factor analysis of items to produce homogeneous scales are discussed at some 
length, in terms of the effect of item selection on reliability. Statistical considerations are 
again largely absent, though there are some brief references to measurement error and the 
need to correct observed validities and reliabilities for measurement error, in the context of 
path analysis and latent variable models. 


Chapter 8, by Fielding, is on binary segmentation and AID. The idea of binary splitting 
and the resulting tree structure are illustrated by a simple example of a three-way classifica- 
tion of absence from work by age, sex and skill level. The extension of the method to the 
general cross-classification and a multivariate response is then given, and illustrated by a 
survey of flat prices classified by five factors. A very complex tree structure is produced by 
AID for this example, and is discussed briefly. There is a long section on the stability of tree 
structures under repeated sampling, the choice of stopping rules and questions of statistical 
significance. 


There is, as the author notes, a “ traditional approach to such data analysis . . . in the 
context of models. Possible interaction effects are incorporated in a model, the parameters 
estimated and tested for significance.” However this (analysis of variance) approach is 
considered to be unsatisfactory: “4... [it] requires a detailed parametric model specified at 
the outset, which must be assumed to be appropriate. . . . If data are sparse undue restrictions 
are placed on the parameters so that even if a full and well specified model is appropriate 
for the structure at hand, it becomes unrealistic. What is necessary in exploratory research 
is an approach which is free from such restrictions. . . ." і 


The author's dismissal of * restrictive models ' is inadequate on several counts. 


First, the specification of a * parametric model ' includes the specificatión of the prob- 
ability distribution of the response variable. This is not something which ‘ must be assumed 
to be appropriate ’, but is open to empirical verification, by probability plotting of residuals 
from the model, and by plotting of standard deviation against mean for each cell of the cross- 
classification. In the two examples the author quotes, absence from work and flat prices, in 
practical examples I have met, both these variables have very skewed distributions and are 
nearly normalised by a log transformation. Simpler models often result, with fewer inter- 
actions necessary, when the appropriate scale is used. This has appeared most noticeably 
in recent attempts to fit linear models to proportions in contingency tables. "Tables which 
can be represented by simple interpendence models on the logistic scale show high-order 
interactions on the linear scale which have to be ‘ explained °. 


More generally, a wide range of continuous and discrete probability models is available, 
and probability plotting procedures are available for most continuous distributions. 


Second, sparsity of data (for example, some empty cells in the complete cross-classifica- 
tion) does not prevent models being fitted and tested, if hierarchical model fitting procedures 
are used (for a detailed discussion, see Aitkin, 1978). For each empty cell, one parameter 
cannot be estimated, but a hierarchical procedure will be able to fit all estimable main effects 
and interactions up to the total number of filled cells, usually far more than necessary to 
obtain a parsimonious model which fits adequately. For large survey models, it will usually 
be sufficient to fit 150-200 parameters in a regression model, which is not an impossibly Jarge 
number, though it is outside the range of most statistical packages. 


Third, AID is itself not free from implicit model restrictions. The splitting criterion is 
based on maximising the between-group sum of squares resulting from the binary split into 
two groups. But why is this a desirable * formal procedure in algorithmic form’? "There is 
an implicit assumption that the procedure will recover a corresponding population structure 
in which the mean response varies over groups. But sums of squares procedures are optimal 
only if the response variable is normally distributed, and if it is skewed or binary, optimal 
procedures do not use simple sums of squares. 
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Fourth, the end result of an AID analysis is often a tree structure of baffling complexity, 
as in the flat price example on pp. 236-7. The interpretation of the interactions found is a 
major task, and in the example no satisfactory description is given of how price varies with 
the five explanatory variables. “ Interaction is a complex concept and no steadfast rules for 
interpretation can be laid down. Each data set must be examined on its own merit. The 
results of the algorithm have reduced the mass of information to manageable proportions. 
A careful reading of the accumulating literature on experimentation with AID and its inter- 
pretation in published applications is recommended to the potential user. . . .”’(!) 


The author quotes one of the few theoretical studies based on a statistical model (p. 
253) and concludes that the stopping rules conventionally adopted for the size of the ratio 
of between-group sum of squares to total sum of squares may be much too low—that is, 
that the tree structure produced may be much more complex than warranted. 


In §8.3.5, trees produced by AID are used to suggest regression models incorporating 
plausible interaction terms. The methods suggested for fitting such models are unsatisfactory 
(see Aitkin, 1978, for further discussion), But the need to develop a ‘ predictive model ' 
based on multiple regression only makes clear the limited value of AID: appropriate parsi- 
monious regression models may be developed entirely within the framework of conventional 
linear models, and AID is unnecessary. 


The second volume begins with a chapter by Knott and O’Muircheartaigh on classical 
estimation and hypothesis testing. The section on point estimation covers bias, minimum 
variance and mean square error, admissibility, asymptotic properties, sufficiency and invari- 
ance, and least squares, maximum likelihood and Bayes’ estimation. Interval estimation 
covers unbiasedness, asymptotic properties, and pivotal variables. Hypothesis testing dis- 
cusses the Neyman-Pearson lemma, likelihood ratio tests, simultaneous testing, and goodness 
of fit. 


Bibby’s Chapter 2 is titled “Тһе General Linear Model—A Cautionary Tale ". The 
second paragraph begins: “ But the simplicity of the GLM should not obscure the fact that 
it can also be a trap, a snare, and a delusion. The traps and snares are seductive and far from 
self-evident. . . .° The trap, snare and delusion comment is repeated in the conclusion in 
82.7. 


The argument given in support of this critical view is unconvincing when statistical, and 
unfair when not. After a discussion of the lack of information in the correlation matrix 
about complex interrelations among variables, the author introduces regression analysis 
using ordinary least squares in 82.2. The Gauss-Markov theorem is stated with heavy 
emphasis on the assumptions, and common cases of failure of the assumptions are described. 
Notably lacking is any reference to a probability model for the errors, even in the discussion 
of residual examination. This is unfortunate, as it lends support to the ‘ technique ’ rather 
than ‘model’ philosophy. Some of the criticisms expressed of least squares are really 
criticisms of the assumption of a normal error distribution (which is never raised, though 
t- and F- tests are mentioned in passing), and could have been omitted if the choice of a 
probability model (including the possibility of response variable transformations) had been 
discussed. 


The section (82.2.5) on * hidden traps’ is unconvincing. The author quotes from an 
article in the American Journal of Sociology which gives several examples, in one of which 
two subsets of predictor variables are equally correlated ‘8 within subset, equally correlated 
"2 between subsets, and equally correlated -6 with the response. There are three variables in 
the first subset and two in the second, and the regression coefficients are “19 for variables in 
the first set, but -27 for variables in the second. The author asks: * why is this? ', and dis- 
cusses this ‘ phenomenon of differential repetitiveness '. An examination of the values of 
R2 for different subsets would have been informative. Consider the simpler situation in 
which there are n predictor variables, all equally correlated -8, and all equally correlated -6 
with the response. The regression coefficient is 3/(1 + 4n) for each variable, and R2 is 1-8n/ 
(1+4n). Asn increases the regression coefficient approaches zero, and R2 rapidly approaches 
a limit of -45 (taking the values :36, -40 and -41 for n=1, 2, 3). In practical terms there is 
very little information in the third and subsequent variables, and increasing the number of 
variables simply spreads the available information more thinly over more variables. 


A ‘further problem’ is said to be identified when correlation coefficients vary within 
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each group. When the two variables in the second group referred to above have correlations 
"6 and “55 with the response, the corresponding regression coefficients change considerably 
to :38 and -13, Again an examination of R2 clarifies the situation. In the simpler case 
above, with two predictor variables correlated “8 with each other and -6 with the response, 
the regression coefficients are each :33, and R2—-40. If the correlations with the response 
are ‘6 and -55, the regression coefficients are 44 and -19, and Қ2-->37: the increase in R2 
over that (-36) for the first variable is negligible, and the second variable is irrelevant given 
the first. The conclusion, on the basis of this evidence, that ‘‘ OLS estimators . . . can be 
extremely misleading and dangerous to use” is quite unwarranted. The evaluation of 
standard errors for these regression coefficients, and tests for nullity of subsets of regression 
coefficients, would also have clarified the above example, and would have expanded the 
passing mention of biased estimation. 


'The section on residual plotting is useful, though it could have been improved by real 
rather than contrived examples. Section 2.4 on ‘ extensions of the model ’ deals briefly with 
three applications: analysis of variance and covariance using dummy variables, and path 
analysis. The treatment of the first two is so brief as to be unhelpful, and the idea of fitting 
a sequence of models of increasing complexity is not brought out. Path analysis receives 
more attention, and considerable criticism. Section 2.5 deals with extensions of the model 
in which failure of the assumptions of the Gauss-Markov theorem occurs: correlated errors 
and heterogeneity of error variance, leading to generalised least squares, measurement error 
in the predictor variables, leading to instrumental variable estimation, and the analysis of 
proportions by log-linear models. Further extensions to multiple dependent variables and 
simultaneous equation estimation, canonical analysis and discriminant analysis are mentioned 
and a few references given. 


In his concluding section, Bibby mounts a frontal attack on the formulation of linear 
models to deal with substantive questions. He describes a survey to answer the question of 
what factors affect the educational and occupational aspirations of schoolboys. This ques- 
tion, he says, “ could be stated, understood, and answered by the literate man in the street, 
[while] the new * statistical * question [in terms of a linear model] is formulated in such а way 
that only a few experts can understand it, let alone express an opinion". Among other 
undesirable side effects, the GLM formulation '* narrows the question by restricting its terms 
of reference to a particular model based upon a particular set of observed empirical data." 


One cannot have it both ways. 1f the ordinary literate man in the street can answer the 
above question, then there is no need for an expensive survey by social statisticians to confirm 
his answer. Indeed, if the question can be answered without reference to any “ particular set 
of observed empirical data", every man, literate or illiterate, can provide his own answer 
with equanimity. The point of a statistical formulation of the problem is that questions 
about the role of social processes have to be expressed in statistical terms, involving a properly 
designed survey of a well-defined population, and an appropriate probability model for the 
resulting set of empirical data, if we are to come to conclusions which will be accepted by the 
general scientific community, as well as the literate man in the street. 


To claim that the linear model formulation 4... trivializes the question by removing its 
substantive richness and substituting apparently meaningful yet potentially trivial questions 
concerning the values of unknown regression coefficients ", and substitutes ‘*. . . technical 
matters such as bias, misspecification [and] optimality . . ." for “ discussions concerning the 
nature of the social forces in action ”, is to grossly distort both the role of the statistician and 
statistical theory, and the general level of statistical sophistication, in social research. 


Chapter 3 by Macdonald on path analysis begins with a disclaimer: “ So this is the 
practical researcher's guide to path analysis. . . . The presentation of such а guide should not 
be held to entail the desirability of the technique. But the technique can provide useful 
summaries of our survey data. And it is widely used. So it behoves us as researchers at 
least to know how to use it.” 


Given this equivocal attitude, the author owes it to his readers to give some warning of 
when path analysis is mot a desirable technique, especially as some of its proponents claim 
for more than that it provides only a ‘ useful summary ' of survey data. While there is some 
discussion of whether standardised or unstandardised regression coefficients should be used, 
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and how to determine indirect effects of variables, this chapter suffers from a major weakness, 
which is endemic in discussions of path analysis and causal modelling. 


The author makes clear that an а priori causal ordering of the variables is one of the 
assumptions of the model: “ Our starting assumption for path analysis must be a specifica- 
tion of the predictive ordering between the variables of the model . . . the ordering assump- 
tion is basic. There is no way of deriving this ordering by inspection of the data.” In other 
words, causality is not in the data, but in the eye of the beholder. 


Once a final model is obtained, the author cautions in his conclusion against assuming 
that the prediction equation established from a particular survey would apply if one of the 
predictor variables is systematically altered: “ We see that a unit increase in X; has a large 
impact on [the response] Хі: we are interested in changing X1 and decide to change Ху. 
But our prediction equation is a prediction equation for individuals within the current system; 
not a prediction equation for changes in the system." 


But even this limited interpretation is incorrect, and the definition on p. 86: “А con- 
ventional path coefficient gives (under the assumptions of the model) the expected effect of a 
change of one standard deviation in the variable (holding other variables constant) . . ." 
is seriously misleading. If the regression coefficient of father's income on father's education 
is 0-4, what does this mean? It is impossible to change father's education, and there may not 
exist subpopulations with identical values of all variables other than education. To state 
the point briefly but firmly, causal inferences cannot be legitimately drawn from survey data. 
The causal effect of a unit change in a variable x on the mean value of y can only be 
estimated by performing an experiment in which the controlled variable x is systematically 
varied, unconfounded with other variables, in a properly randomised experiment. Attempts 
to draw causal inferences from non-randomised survey data have resulted in futile controver- 
sies, as for example over the role of genetic and environmental * effects ° on intelligence (for 
a trenchant criticism of this debate, see Kempthorne, 1978). 


Chapter 4 by Payne is on the log-linear model for contingency tables. The treatment is 
clear and comprehensive, with a good discussion of the log-odds ratio, conditional indepen- 
dence in three-way tables, the iterative weighted least squares and iterative scaling computa- 
tional procedures, goodness-of-fit and model selection, and logistic models for binary 
responses. The selection of a model by partitioning the goodness-of-fit statistic provides a 
close parallel with the ANOVA. of an unbalanced cross-classification. This parallel holds 
for the symmetrical table (all classifications explanatory) as well as for the asymmetrical 
table (one classification a response, the others explanatory). In the former case, we interpret 
the cell counts as independent Poisson variables (subject to any constraints imposed by fixed 
marginal sub-table numbers), and the log-linear model is then a model for the log of the 
mean number of observations occurring in each cell. Model selection in contingency tables 
is a subject of current research: a discussion is given in Aitkin (1979). 


Chapter 5 by Bartholomew gives an overview of recent research on stochastic models 
for population change (only seven of the 24 references are earlier than 1970). This chapter is 
written at a considerably higher mathematical level than the others, and the interested reader 
will need a good background. The basic discrete time model is developed in terms of * stocks ', 
or numbers in each state of the system, and ‘ flows ° between states. The Markov chain model 
is considered first, with estimation of transition probabilities and a test for their constancy 
at each step. More general tests for the Markov property are also considered. Continuous 
time models are introduced by the Markov process, and semi-Markov and multiple decre- 
ment models are considered in considerable detail, including estimation from grouped and 
incomplete data. 


Chapter 6, by Bebbington and Smith, is a discussion of the effect of stratified and 
clustered survey designs on the estimation of a correlation matrix. For a population of 
2000 units grouped into 40 strata of 50 units each, 1000 samples of 200 are drawn according 
to each of four sampling designs. For each sample, the correlation matrix of 12 attitude 
items is calculated, the individual items having distributions (on a five-point scale) which 
vary from bell-shaped to J-shaped. Biases of the individual correlations are considerable for 
cluster sample designs, and the sampling variability of the latent roots of the correlation 
matrix is greatly increased over normal theory values in these cases. 


The authors conclude that “. . . . insufficient thought has been given to the relationship 
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between survey design and underlying multivariate models. If a population admits a 
meaningful grouping into strata or clusters then it will also admit an analysis at the within 
group level rather than the population level.” Generalised variance components models are 
necessary for such an analysis, and are a subject of current research. 


The final chapter by O’Muircheartaigh is on response errors. The first three sections 
provide a clear introduction to sources of response error, and their identification and diagno- 
sis. The mathematical model developed in §7.4 is closely related to psychological ‘ true- 
score ' measurement models, and to variance component models. Estimation of interviewer 
variance is considered in detail. 


There are a few statistical errors. In Volume 1, on p. 32 regression lines are incorrectly 
drawn as the major axes of concentration ellipses. On p. 113, the claim that a factor struc- 
ture can always be found if (p—q)2=p +q (where p is the number of variables, а the number 
of factors) is incorrect. The latent class * model’ described on p. 158 for a 2x2 table is 
trivial because it requires the same number of parameters as there are observations, so is not 
falsifiable as a model. 


Referencing is often annoying: many references in the text are not given in the reference 
section at the end of the chapter, are out of alphabetical sequence, or are incorrect. 


The most important failing of these two volumes, and of survey analysis in general, is the 
lack of attention to statistical models, to efficient methods of estimation, and to tests for 
goodness-of-fit. This applies particularly to those ‘ exploratory’ techniques which use 
implicit latent variable models: cluster analysis, factor analysis, and latent structure analysis, 
but also to multidimensional scaling methods and AID, Without formal probability models 
for survey data, the output of numerical algorithms cannot be assessed for sampling vari- 
ability, and the meaningfulness of ‘ solutions’ has to be assessed in terms of their ‘ inter- 
pretability °. 


Recent developments in statistical theory hold the promise of profound changes in this 
situation. The structural model approach of Jéreskog and his colleagues at Uppsala (see 
Jóreskog and Sórbom, 1978), made generally available in the LISREL and EFAP computer 
programs, is extremely powerful and general, and allows a unified approach to a diverse 
class of models with measurement error in the variables, factor structures for observable 
variables in terms of latent variables, and instrumental variables. Maximum likelihood 
estimation in a remarkably wide class of models involving latent variables, missing, censored, 
grouped or truncated data, mixtures of distributions and unbalanced variance component 
models can be very simply obtained by the EM algorithm (Dempster, Laird and Rubin, 
1977). Over the next few years, wide dissemination of these developments should have a 
marked effect on the analysis of survey data. 
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BOOK REVIEWS 


Hocnuani, М. (1978). Troubled and Troublesome: Coping with Severely Disordered 
Children. London: Burnett Books/André Deutsch, pp. 312, £6:95. 


Although tbe literature on children's emotional and behavioural problems has greatly 
expanded in recent years, little has been written on adolescents whose behaviour is so 
extreme that they need ‘ secure’ placement. This book, therefore, which aims to describe 
severely disordered children, to explain why they become so and provide guidelines as to 
how they might be managed, helps to fill a gap, especially as it deals with the problems of 
girls as well as boys. As principal of a school providing assessment and treatment facilities 
for a wide range of severely disordered and delinquent children (not all of whom require 
secure facilities), Dr. Hoghughi is able to write from a firm base of experience and with 
first-hand knowledge of the backgrounds and needs of these children. 


Part Y of the book gives an account of a study comparing three samples—62 girls admitted 
to a * secure’ house over a period of two years, 62 boys placed in the house over the same 
period, and 62 boys admitted to one of the * open ' houses in the school. Comparisons are 
made between these three groups on a large number of variables, though the absence of any 
information (other than in a short general paragraph) of the methods of analysis employed, 
or of the actual levels of significance found in the case of differences reported, makes it very 
difficult to assess the value of the findings. Some of the tables, too, are difficult to interpret 
or presented in an idiosyncratic way, €. B ‘accuracy of social .perception ' is classified in 
terms of ‘ good’, *, ‘bad’ or ‘average’; ‘ special schooling ° in terms of ‘ maladjusted ’, 
‘day ESN’, residential ", us betiaviour problems’, ‘ expelled/suspended’ and ‘ teacher 
assault’; and d maladjustment ' in terms of ‘ severe ’, * moderate ’, ‘ aggressive °, * emotional’ 
and behavioural’. Furthermore, as all three samples were being dealt with in a very special- 
ised setting, we do not know how different the * extreme’ children really are from other 
children, or to what extent the findings can be generalised. 


Part П, which ranges widely on * Aspects of Intervention ', contains much material of 
interest, but is somewhat repetitive and lacking in coherence. One comes across occasional 
eyebrow-raising statements which need further elaboration if they are not to be misunderstood. 
For example, while aware of the dangers of labelling, Dr. Hoghughi advocates that those 
working with severely disordered children should start with the expectation that the * worst 
will happen (pp. 175/6). In spite of all his caveats, his definition of a ‘ problem’ as an 
* unacceptable condition ° is not very helpful, particularly as he states that questioning of 
parental authority is ‘ acceptable ' in a 16-year-old but not in a 10-year-old (p. 223). 


However, while the book has its weaknesses if judged on strict research criteria, it pro- 
vides useful information on the range of placement and treatment possibilities for severely 
disordered adolescents and thoughtful discussions on coping with them on a daily basis. 
Hopefully, also, it will encourage improvements in the present system, so forcefully attacked 
by Dr. Hoghughi, who calls attention to the lack of systematic knowledge about problems of 
‘extreme ’ behaviour, the inadequacy of current methods of intervention and of professional 
training, and the confusing nature of clinical assessment and classification. 

MAURICE CHAZAN. 


MECHANIC, D. (1978). Students under Stress. London: University of Wisconsin 
Press, pp. xxi + 231, c. £10-50. 


This reprint of an earlier work published in 1962 is still an important book in the field 
of research on stress in the academic milieux. Its value is now enhanced by an updated 
foreword written with skill and sensitivity by the author in 1977. The book describes valu- 
able experimental research in assessing the adaptation to stress of a small group of 22 
participants, mainly Ph.D. students. 


_ The research places an emphasis on assessing constructive coping behaviour over a 
period of time before and after the preliminary doctoral examination in a faculty setting. 
As such it is an exercise in observing stress in a relatively fixed social situation, and can, to 
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some extent, be regarded as an assessment of stress in conditions of what the author calls, a 
* continuing exposure to noxious work in a living environment ', which would tend to set up 
a vicious circle of ‘ incubated anxiety ° (Eysenck). 


To this extent the author's use of the * combat anxiety ’ paradigm (Grinker and Spiegel) 
seems inappropriate, except as a framework for observation of group cohesion or group 
disintegration. In combat, despite the supreme peril of death, which is not present in the 
non-combat stresses, both the individual and the group have a legitimate target of hate-—the 
common enemy—which allows for relief of hateful anger as well as for a largely guiltless 
intention and an overt attempt to annihilate the opponent. The conflict is clear, the stresses 
of prolonged high arousal of fear and anger can be worked out in action, or anyway in 
preparing or training for action or openly talking about the foe. The objectively far milder 
stresses of an impending academic examination can, paradoxically, be more insidious and 
wearying to some individuals, in the sense that the students feel threatened but must pretend 
that they like, respect and cherish their adversaries of the faculty who are going to judge them. 
The faculty does not literally shoot the failing students. Their violence is purely symbolic, 
and defended as necessary for the preservation of academic standards, or ‘ for the students’ 
own good’. Мог can the students mount an overt attack, verbal or otherwise, on their 
* friendly °’ foes! Тһе situation, if at all in any sense parallel to combat, is perhaps more akin 
to a covert ' special operation ° assignment, or an intelligence, or counter-intelligence exercise 
in an overtly friendly territory! 


The fact that it is at all possible to use the combat metaphors in describing educational 
milieux and reactions to their procedures should, of course, be abhorred, as much as the still 
common notion in schools and academies that examinations should be used as testing points 
of endurance under stress. Both are, to me, barbarian conceptions, but the history of edu- 
cated mankind does not seem to vouchsafe a more civilised approach. Sporadic, benign 
models of education do appear from time to time, but the formal, mass education for 
examinations, like mass sport spectacles, seems to remain in the realm of a substitute combat. 
Sad as it is, the republication of this useful book stands as a confirmation of this fact, and the 
author is not to blame for his courage to treat things as they are. As he so aptly sum- 
marises: “ The result is that the examinations for many if not for most students is just a poor 
representation of an intellectual challenge." 


The shortcoming of the experimental side of stress studies described in this book is the 
absence of some objective instruments for assessing stress. Although, from the author's 
versatile approach, one can be quite confident that he must be skilled in assessing stress and 
anxiety in clinical contact, the absence of a standardised moderating tool makes replication 
of his research impossible. 

J. A. WANKOWSKI, 


Ворроск, J. (1978). Learning through Small Group Discussion. Guildford: SRHE, 
pp. viii + 137, p. £4-20. 

ABERCROMBIE, M. L. J., and Terry, P. M. (1978). Talking to Learn. Guildford: 
SRHE, pp. 158, p. £4-20. 


These two monographs, published for the Society for Research into Higher Education, 
are the outcome of distinct but related research projects. The first is addressed to inexperi- 
enced teachers and experienced teachers who either have not been involved in small group 
work or, if they have, have found the experience difficult. In the book the experience of 
colleagues who have experimented with group teaching was used to identify the general 
problems and issues that arise in connection with work of this kind. In the second book, 
various methods which have been used in an attempt to help tutors and students to gain 
insight into the group process are discussed. The second monograph thus is a useful comple- 
ment to the first. 

The distinctive contribution claimed for small group teaching is that it encourages 


students to become more autonomous learners. To achieve this goal, both books emphasise 
that the role of teachers and students in small groups require them not to act as givers and 
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receivers of information respectively, but rather to co-operate in the exploration of knowledge. 
In addition, the relationships amongst students, instead of being ignored as of neutral or 
negligible value in the education process, must be recognised and fostered as a powerful 
medium for interaction or collaborative learning. 


Jean Ruddock’s book transforms these rather vague and general orientations into a 
series of practical questions, the two most basic of which are—' How can learning be en- 
couraged through participation?’ and ‘How can the seminar leader make an effective 
contribution?’ Also included are chapters dealing with topics such as group size, monitoring 
the group process, leaderless groups and the problem of training students for small group 
work. Throughout the book, the emphasis in the discussion is on the practical issues involved 
and the actual experience of participants. The format of each chapter is similar: in the 
opening section the issues are identified, often using extracts from the published literature. 
Then in the second section the author presents * the evidence '. This evidence takes the form 
of selected extracts either from interviews with colleagues or from the transcripts of group 
discussions. While an element of repetition is unavoidable in a sectional treatment such as 
this, the communication of the immediate thoughts and feelings of group participants work- 
ing in universities and other institutes of higher education gives the book an obvious rele- 
vance to colleagues who are interested in trying out such group work. They will not find, 
however, any lists of behaviours or skills which, when acquired, will lead them along the road 
to ‘ success °, however that is defined. More realistically, what is conveyed is the complexity 
of the small group process and the variability and diversity of factors which may combine 
to affect that process in any particular group in any particular institution at any particular 
time. From an awareness of the complexity of the group process the author hopes that 
readers will acquire understanding that will serve as the basis of further study of their own 
work in small groups. 


The second book, Talking to Learn, also uses extracts of transcripts from video-record- 
ings and group discussions as the main medium to explore further the nature of the under- 
standing that is appropriate to effective participation in small group teaching and learning. 
It is also concerned with evaluating ways in which such understanding can be developed in 
tutors and students. Its basic premise is that the assumptions participants make about their 
roles in the teaching and learning processes, encapsulated in their philosophies of education, 
affect their manner and behaviour and powerfully influence what goes on in the small group. 
The unconscious nature of many of these assumptions and how they may be associated with 
behaviour which works contrary to the intentions of the participants is well illustrated in the 
book. Hence awareness of assumptions and the way they are related to behaviour is a step 
on the way to subsequent changes in the attitudes and behaviour of group members. 


'Three methods to bring about such changes were tried in the project. In the first of 
these, a teacher and his students viewed video-recordings of their own tutorials. Four tutor 
groups were involved—one concerned with Swedish poetry, another with English literature, 
the third with Geography and the last with French prose. Each group discussed the intellec- 
tual content and the way this was influenced by their behaviour. In the second approach a 
small group of teachers discussed amongst themselves for at least 10 weekly sessions, their 
personal experience of small group teaching. For the third approach, larger numbers of 
teachers were involved at a less intensive personal level. They studied group interactions 
from selected video tapes with annotated transcripts in which comments based on retro- 
spective understanding and analysis had been inserted. 


The effectiveness of the first approach is found in the transcripts themselves. They 
show, for example, the way in which tutors move away from dominating the seminar talk to 
listening responsively to their students, how students change their perceptions and behaviour 
in relation to preparing for seminars, and how participants begin to attend to non-verbal 
behaviour as an important element in group interaction. Transcripts of the group discussions 
that took place in the second method show similar changes. Tutors’ self reports are also 
quoted: as one indicated, ‘It’s like a sudden injection of self-awareness.’ Because the 
third approach was less intensely personal, evaluation of its effectiveness was more difficult. 
Questionnaire responses, however, reported changes in teaching and feelings about teaching. 


_ It would be possible to make comments about the limitations of self-report and question- 
naire forms of evaluation, but these would be of little consequence compared to the nature of 
the qualitative remarks made by the participants. These convey feelings of dynamic rejuven- 
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ation associated with group experiences of learning that should inspire colleagues who have 
not had such experiences to explore ways in which they might. In both books, the medium 
indeed is the message. 


Both books are strongly recommended to teachers who are excited by the potential of 
small group work. 


Two omissions, however, will be mentioned. The first concerns the absence, perhaps for 
reasons of modesty, of any specific mention of the role played by the participant researchers 
in creating the supportive climate without which the participants would have been unable to 
make the revealing and constructive contributions which they did. Hence, the books are 
best regarded as indicating what can happen in small groups, rather than what necessarily 
will. The second is the lack of discussion of the institutional contexts in which the small 
group work took place. This omission is understandable given the considerations of space 
and the complexity of the issues involved. Hence this second comment is not intended as 
criticism, but an expression of the desirability of another monograph which deals with these 
issues in a similar manner to that used in the two books under review. 

DOUGLAS FINLAYSON. 


Tuomas, D. (1978). The Social Psychology of Childhood Disability. Andover: 
Methuen, pp. viii + 165, p. £3:50. 


The first chapter deals with definitions of social psychology and some of its topics such 

as impression formation, reasons for friendship choice, attitudes, social learning, and the 
behaviour of people in groups. Ways of categorising handicap are outlined, including an 
interesting ‘ psycho-social’ method, which incorporates: (1) high visibility, e.g. Фе para- 
plegic’s wheel chair, (2) problems of communication, as in deafness, (3) episodic conditions 
such as asthma, epilepsy, haemophilia or maladjustment, (4) conditions carrying social 
stigma, such as mental retardation, (5) a combination of the above as in Down’s syndrome 
with its visible aspect and social stigma. A dominant interest of the author is introduced (and 
was mentioned also in the preface), namely society’s attitudes to the handicapped. This is 
treated systematically in Chapter 3 (* Attitudes and the Handicapped °) but it is a prominent 
theme also in the other chapters, the titles of which are ‘ Personality and Self-Image’, 
‘Socialization of the Handicapped Child’, ‘The Family and Handicapped Children’, 
‘Schools and Handicapped Children’. The second chapter (‘ Handicapped Children °) 
includes a table published by the Department of Education and Science showing the numbers 
of special schools, pupils and teachers between 1950 and 1975. The history of educational 
provision is then discussed in a section on attitudes to handicapped children. 


The book makes no claim to any first-hand study of the behaviour and experience of 
handicapped children, but ‘ to discover what is known and believed and to evaluate it in the 
light of experience’. This is fair enough, given the author’s interest in so many kinds of 
handicap. However, the evaluation suffers from gaps in knowledge, perhaps resulting 
inevitably from the wide’ spread of interest, and also from generalising uncritically from 
studies made in a particular area, e.g. America. When the author states that blind children 
grow up in a world of stationary objects it seems unlikely that he has watched blind children 
playing football, and when he defines the athetoid as having ‘ slow, writhing movements ’ he 
takes no account of the frequent social problem that an athetoid’s sudden quick involuntary 
movement may throw his tea-cup across the room. The author's comment that the socialisa- 
tion function is the prime objective in special schools, with instruction taking second place, 
must arise from too little acquaintance with special schools. Не regards ‘ labelling’ by 
sending a child to a special school as ‘ punitive ', and concludes in the last sentence of the 
book that “ the most significant and enduring advances are likely to be achieved through 
the accelerating trend of educating more and more handicapped children together with 
their non-handicapped peers." Earlier (pp. 136-7) the author had listed the difficulties as 
well as the advantages of such a policy, and on p. 66 had given room to Gottlieb's (1975) 
concern at such a trend, but in the rather short section on Conclusions these qualifications 
are omitted. This emphasis is due to the author's interest, declared early in the book, in 
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overcoming stigma, which he believes to be fostered by segregated schooling. Incidentally, 
it is puzzling to know what meaning to ascribe to his comment that an example of slow 
progress in breaking down prejudice is that the status of the mentally retarded is not yet as 
high as that of the blind. 
Indeed it is in its thought-provoking nature that the interest and value of this book lies. 
There is a useful bibliography of twelve pages, many of the references being to recent work. 
RUTH PICKFORD. 
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EDITORIAL 


The British Psychology Society has accepted the nomination of Professor Dennis 
Child of the University of Newcastle to serve as the next Editor of this Journal. He 
will edit the next issue. 


In this my final issue I should like to pay tribute to the many people who have 
offered advice to me over my five years as Editor. The Assistant Editors have carried 
an enormous burden of work in reading between them all the articles submitted 
and deciding on appropriate referees. I am grateful to Professor Bennett, Dr. Brown 
and, for this final issue, Dr. Trown for the efforts they have made, with the many 
referees, to maintain the standard of the Journal. 


Very special thanks go to Dr. Shirley Cunningham who has acted as Sub-Editor 
for me and for the previous Editor. She has meticulously checked the edited type- 
scripts and prepared them for typesetting, besides proof-reading the galleys and 
negotiating time schedules with the printers. A debt of gratitude is also owed to Mr. 
Douglas Grant and Miss Wilcox of Scottish Academic Press for their work on behalf 
of the Journal and to the current printers, Messrs R. and R. Clark, who took over in 
difficult circumstances with the breakdown of our previous arrangements. Gradually 
it has been possible to bring the publication of the Journal back on schedule. 


I should also like to thank the secretaries, Pam Gordon and Ginny Farrar, who 
have helped in the smooth running of a complex organisational arrangement, which 
involves processing over 150 articles a year, many of which have to be resubmitted 
after amendments suggested by the referees. It is still possible to publish only about 
a quarter of the articles received, thus the task of sifting, refereeing, and selecting the 
most appropriate is considerable. The general policy of the journal is now discussed 
regularly by the Editorial Committee, and the advice and support given is much 


appreciated. 
N. J. ENTWISTLE 
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VERBAL INTERACTION IN NURSERY SCHOOLS 


By MARGERY G. COOPER 
(School of Education, University of Durham) 


Summary. An ecological study of the quantity and distribution of language among 
children in nursery schools is reported. Two different situations, one when the presence 
of the adult is incidental and the other where she assumes a teaching role, are analysed 
for categories of verbalisation by the children. The interactive effect of the adult in both 
situations is examined. It appears that she may be responsible for a re-distribution of 
amounts of verbalisation over the categories, for drawing quantity towards herself and 
for adopting a didactic role. She in turn verbalises most with individual children. Age 
and sex differences are not significant and there is consistency in her verbalising 
procedures. 


INTRODUCTION 


SINCE language can be seen to have a crucial role not only in communication but also 
in conceptualisation and symbolic manipulation of the environment, it is not surprising 
that teachers of young children profess to promote its development among their pupils. 
Studies by Thomas (1973) and Tizard et al. (1976) indicate, however, that interaction 
between child and teacher does not figure prominently within the daily programme, 
and that dialogue is minimal. While research is continuing to strengthen our some- 
what tenuous hold on knowledge of how development takes place (e.g. Brown and 
Bellugi, 1964; Houston, 1970; Menyuk, 1971; Nelson, 1973; Bruner, 1975; Snow, 
1977), it is important to increase our knowledge about how and when children use 
language as a tool of communication and the expression of ideas. The part played 
by the adult is so important that it, too, is worthy of attention. The nursery school 
setting would seem to provide opportunities for naturalistic research, where the free 
flow of movement does not inhibit the child either in his ways of behaving or in his 
talking. 


Studies with an ecological flavour are probably the most revealing since the 
child's utterances are recorded within the natural flow of events. Different forms of 
adult interaction, * expansion or extension ' (Cazden, 1972), and manipulation of the 
situation through increased and enriched language teaching (Woodhead, 1977) throw 
light on the qualities and quantities of communicative effort by the adult, but ecological 
investigations help to provide a firmer basis for these more specific investigations. 
Three such studies are now examined. 


Tough (1972) analysed qualities of language used in two different situations 
(nursery school and home) and so elicited strengths of language production by 60 
children with favourable and less favourable linguistic home backgrounds. In the 
more favourable environments children were found to be “ beginning to use language 
for recall, anticipation and planning, for explanations, for the expression of relation- 
ships and of possibilities ". Caldwell et al. (1970) examined the development of 
children from 1 to 4 years of age in a nursery day care centre. Older children were 
found to talk more, and an interesting change could be seen in the decline of simple 
conversation and the increase in questioning and role-playing. The role played by 
the teacher in eliciting language from the children clearly changed as the children's 
use of language matured. With structured groups, didactic teaching (informing) was 
a major technique up to age 3 years, but with 4-year-olds the teacher used more 
questioning. As the children grew older there was a marked increase in communica- 
tion with other children through conversation and the more complex forms of talking 
and behaving. There was also a decline in egocentric or self-directed verbalisation 
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Shields and Steiner (1973) used samples of spontaneous speech in children aged 
3-5 years, to examine norms of language structure in relation to situational context. 
Analysis was made not only of what the children said, but the situation in which this 
happened. Three types of pre-school groups—nursery schools, nursery classes in 
infant schools, and playgroups—were used and there were two conditions of sampling. 
The first was when children were playing spontaneously and the second when adults 
were present, Children’s use of a main clause around a verb increased with age with 
a ‘plateau’ between the ages of 34 and 44 years. Social class differences were 
insignificant. Grammatical immaturity (regularised plurals, regularised weak past 
tenses, lack of subject/verb agreement and instability of the auxiliary verb system) 
diminished with age as did grammatical omission, but working class children continued 
to be affected by colloquial omission. A straight comparison showed that conversa- 
tions between children were slightly longer than those between teacher and pupil. 
This was considered to be the result of the child withdrawing from initiative and 
falling more into a respondent role. However, communication between adult and 
child, when the child’s area of thinking and his interests were being explored, was 
longer than any other category. 


There is an ecological flavour to all these three studies, but in the present study 
the observational technique A.P.P.R.O.A.C.H. developed by Caldwell et al. (1970) 
was adopted since part of it could be used to describe the verbal behaviours of young 
children in natural social situations. The method seemed particularly suitable to 
handle the complexity of the situation in the nursery school, where complicated 
technological devices might prove restrictive and cumbersome. A subsidiary aim of 
the study was to demonstrate that teachers themselves might well make use of 
observational techniques to answer questions about the incidence and nature of the 
spontaneous behaviours of children in nursery schools. 


METHOD 


In using the ‘ Procedure for Patterning Responses of Adults and Children’, 
acronym A.P.P.R.O.A.C.H. (Caldwell её al., 1970), the objective was to obtain a full 
statement of all the behaviours of a child or adult during a stated period. Using a 
small portable recording machine, observers spoke quietly into it recording aspects 
of the behaviour of either the child or adult, giving all significant details about 
interaction with other people or objects. The details were transcribed from the tape, 
quantitatively analysed and assembled for computerisation. 


The unit of emitted behaviour was a behavioural clause designated by the 
appearance in the record of a verb, e.g. * Mark throws his toys to Jim hurriedly ’. 
Each clause had four basic components: the subject, the predicate, the object and a 
few selected qualifiers (adverbs) providing supplementary information. Although 
the behaviours of the children and adults covered a wide spectrum, verbal behaviour 
took precedence and this was always coded when the subject spoke. The preceding 
four components plus a fifth made a 5-digit statement (two digits were required for 
the wide array of predicates). This gave a numerical language 


Mark throws his toy to Jim hurriedly 


0 29 5 0 
Mark is the central figure, his action, to whom he throws it (a male child) 
0 29 5 


and the manner of throwing. 
0 
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Had Mark shouted when doing so the coding would have read 0, 29, 5, 1 showing 
he had verbalised. 


The code, full details of which are provided by the authors, “ permits a running 
sequential picture of actions emitted by the central figure of the observation and the 
behaviours received by him. In general, the resulting description is very fine-grained 
containing much that might be considered irrelevant for some types of behaviour 
analysis but, at the same time, rich in sequential interactional data " (Caldwell et al., 
1970). 


Тһе“ dry numerical language ' provided a concise method of ordering behavioural 
records. Differences in wording could be controlled by the necessity of reduction to 
a simple statement, and observers (teachers) were trained. Two experienced coders 
acted as * second observers ° to each learner and inter-observer agreement: 


Nos. agreed — Nos. different 
Nos. agreed 





Agreement = 


was eventually established, with an inter-rate correlation of --0-80. 


Teacher presence and its effect on the amount and distribution of verbalisation 
was the variable chosen for analysis. Whenever a child or adult spoke, what was 
said, and any accompanying behaviour, were recorded. Obviously much of the 
children's behaviour was not complemented by sounds or talk, but for the purpose 
of this analysis the following categories were selected: 


(1) the number of instances of verbalisation, 
(2) to whom it was directed, 
(3) the nature of the verbalisation according to selected categories. 


The term * verbalisation ’ is preferred since some of the children's utterances, though 
onomatopaeic, carrying sound, intention and communication to the full, could 
scarcely be termed ‘talk’. The children (N = 112; 55 boys and 57 girls) attending 
seven nursery schools, were aged from 36-50 months, median 43-7 months. The 
time unit of observation was 10 minutes within the period (9.30 a.m.-11 a.m.) when 
two different samples from each child were taken, one where the adult might intervene 
incidentally (unstructured) and the second, when adult presence was structured into 
the sample at the beginning (structured). Analyses of the situations were intended 
to be heuristic and descriptive rather than directed to tests of specific hypotheses. 
However, simple statistical techniques (y? 1 tests of correlated means, and tests of 
correlation coefficients) allowed the testing of the null hypothesis in some instances. 


Because the natural flow of behaviour was at all times to be upheld observers 
often found a * structured ° sample dissolving, as for example when the child moved 
away, or an ‘ unstructured’ sample changing its character when the teacher over- 
controlled apparently ‘ open’ behaviour. Such samples were rejected though it was 
conceivable that incidental adult effect could not be entirely eliminated from ‘ un- 
structured’ samples. Observational studies, where the special strength is reflection 
of spontaneous behaviour in natural surroundings with minimal research controls, 
are, however, regulated by close definition of the categories to be observed. As the 
emphasis in this study was on what was being said and to whom, it is necessary now 
to define the categories of verbalisation. 


Categories of verbalisation 

Preliminary studies had shown that the verbal behaviours tended to group 
themselves more intensely around some categories than others. Eventually nine 
categories were adopted. 
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(1) Confirms 
Information is given which indicates that the pattern of behaviour being emitted 
is appropriate or correct or which confirms to another person (child or adult) that 
she has made correctly some informational assumption or hypothesis. Examples 
would be “ Yes” or “ That is right ". Sometimes an adult asked a question іп a 
general way, “ Who wants to . . .", implying confirmation and received by the 
individual as “ Do you want to...” and identification of self as in “ I do ” is coded 
as Confirmation. 
Code 10 


(2) Converses 
Informational transactions are carried out through brief or casual statements 
about routine situations. The person to whom the information is being given is 
clearly noted. Examples include, “ That is a big big bang "—i.e. below the informa- 
tional/teaching level and being somewhat low level conversational in tone. 
Code 12 
(3) Inquires 
Questions are asked, the content of which refers to information and the answers 
to which involve the giving or completing or correcting information. Examples 
include asking for labels, “‘ What is it called? ”; for confirmations, “ How do we do 
this? "; for reassurance, *' Is this right? ”. 
; Code 16 


(4) Informs or teaches 
This involves the giving of information to another person. Examples include 
discussing, clarifying, explaining, and it is done in a more formal manner than in 
predicate (2). 
Code 17 
(5) Role plays 
This category refers to creative and imaginative verbalisation. A child walking 
around carrying some insignia may be role playing but he will have said, “I’m а 
fireman. Pm going to put the fire out ", to have been recorded in this section. Не 
will also have said it clearly and it would be likely that there was a perceivable 
recipient of the role-playing remark. 
Code 19 


(6) Reinforcement—negative 
The code category covers several types of negative reinforcement including 
showing discomfort, expressing displeasure, criticising, interfering, threatening, and 
assaulting. Action usually accompanied the verbalisation. 
Codes 30-38 
(7) Reinforcement—positive 
Again the code category itemises across a range of behaviour with a positive 
emphasis, supporting the on-going behaviour of another person, e.g. showing pleasure, 
approving, expressing affection, promising, etc. Action and verbalisation were closely 
linked. 
Code 40 


(8) Control techniques—suggests 
These verbalisations describe an implied request given in the form of a declarative 
or interrogative rather than an imperative. However, the control] element is strongly 
implied: the suggestion is to stimulate action which is structured into the verbalisation. 
Examples would include, “‘ May I go to see the gerbils, please? " from a child to the 
teacher. The teacher might reply, “ Can you find the way yourself? ". The control 
is indirect but the implication is strong. 
Code 70 
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(9) Control techniques—requests 
Reference here is to clear requests for action on the part of the person to whom 
the request is directed, or else to requests for permission for the central figure to carry 
out some type of activity. A teacher might say “ David, I want you to pick up your 
blocks ". А child asked the teacher, * May I go to look now? ". The request carried 
a degree of intensity or control since it carried with it no option other than that called 
for by the request, “ Will you please . . . " (a note of exasperation!). 
Code 71 


RESULTS 


From the raw score it appears in Table 1 that there is little overall difference in 
the amount of verbalising between boys and girls. Older children verbalise more than 
younger, confirming the well documented developmental trend. Studies of vocabulary 


TABLE 1 


NUMBER OF INSTANCES OF VERBALISATION AND MEAN NUMBER OF WORDS 


Number of Mean no. 

















Group N instances of words SD Range 
Young Boys 24 591 24-63 11:61 5-52 
Older Boys 31 957 30-87 1617 9-79 
Young Girls 23 457 19-87 12-97 6-52 
Older Girls 34 1177 34-62 17-07 8-85 
All Boys 55 1548 28-15 14-69 5-79 
All Girls 57 1634 28-67 17:31 8-85 
All younger 47 1048 22:30 12:51 5-52 
All older 65 2134 32-83 16:75 8-85 
Total 112 3182 28-41 15-98 5-85 


growth (Smith, 1926; Templin, 1957) show a rapid increase in vocabulary as children 
move from naming to more symbolic language when the coherent possibilities of 
language functions are appreciated. The results do, however, indicate that in this 
sample younger boys verbalised more than younger girls, while there were more 
instances of older girls verbalising than older boys. 


The developmental trend is confirmed for both sexes. The higher mean number 
of words for the older children shows a wider distribution about the mean and the 
range is greater. 

The variable of adult presence is now introduced. The two 10-minute sequences 
called Unstructured and Structured differ in that Unstructured indicates the absence 
of adult involvement with the child except for incidental intervention which cannot 
be prevented in a natural situation with young children. Structured indicates that the 
sequence began with the adult involved in the situation; it was adult directed. 


Table 2 shows from a * ¢’ test that a higher mean number of words were uttered 
in the structured situation while the teacher is present (P < 0-01), a finding which is 
in line with that of Shields (1972) who found that the mean utterance length was 
increased when teacher and children were discussing areas of interest. 

In the structured situation the increase in amount of verbalisation by the children 
is not dramatic, partly because the adult is controlling the situation (Shields, 1972), 
leaving the children less free to talk with their peers and to take the initiative. 
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TABLE 2 
INSTANCES OF VERBALISATION IN THE UNSTRUCTURED AND STRUCTURED SITUATIONS 
Structured Unstructured 
Mean no. Mean no. 
No. of of No. of of 
Group instances | words SD Range instances words SD Range 
Young Boys 247 10.29 7-31 1-24 344 14.33 7.11 1-29 
Older Boys 456 1471 9:41 2-36 501 16-16 919 2-43 
Young Girls 176 7:65 7:86 0-28 281 12:22 8-76 0-38 
Older Girls 549 16:15 10-11 2-42 628 18:47 11:84 1-43 
TABLE 3 
PERCENTAGE OCCURRENCE OF DIFFERENT CATEGORIES OF VERBALISATION BY SUB-GROUPS 
Unstructured Structured 
Young Older Young Older Young Older Young Older 
Category Boys Boys Girls Girls Total Boys Boys Girls Girls Total 
Information 
processing 
Confirms 0-41 0-62. 0-33 1:24 2-60 1-61 1-94 2:60 2:76 8:91 
Converses 0-80 4-00 0-70 2:60 8-10 2:02 0-96 0:62 115 4-75 
Inquires 0:99 1:07 0-74 227 5-07 1-65 190 0-62 1-40 557 
Informs 2:93 5:68 2:35 606 1702 513 10-40 4:94 15566 3613 
Role-plays 1-69 1-73 0-74 3:54 7:70 1:07 1:19 9:04 1:86 416 
No. of instances — — — — 983 — — — — 1444 
Reinforcement 
technique 
Negative 4:70 9:84 404 1030 2888 4-91 9-39 2:92 1007 2729 
Positive 4:47 9-18 3:14 9:85 26:60 425 4-70 2:69 5:60 17-24 
No. of instances — — -- -- 248 -- — — — 199 
Control 
technique 
Suggests 4:55 7.14 2.92 3:90 18-51 1:62 5:84 0-97 357 12:00 
Requests 8-77 9:73 5:18 21.77 45-45 6:49 7:14 228 812 2403 
No. of instances — — — — 197 — — — — 111 





The structured situations were adult directed which might account for the 


decrease in conversation, except in the case of the young boys who showed a marked 
increase. Perhaps they were not over-awed by the formality and thus seized the 
proximity of the adult to continue to converse. The large amount of confirming may 
well reflect the teaching role. Decrease is shown in role play for all groups and in the 
reinforcement techniques and control techniques. The adult in charge of the situation 
controls by her presence and appears to use the situation to promote information 
bearing sequences with confirmatory comment. 


А x? test shows that presence of the adult increases to a significant extent the 
amount of information processing in confirming, inquiring and informing with a 
subsequent change in verbalisation categories (P «0-01). The rank order of the 
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categories is changed by the presence of the adult—but there is not a significant 
Spearman correlation (p = 0-18) between the two rank orders of categories. The 
most dramatic change is in the confirming category which plays a small part in 
verbalisation procedures in the unstructured situation. In the presence of the adult, 
however, confirming occupies second place and of the 216 instances 200 (Table 6) 
are with the adult. A major effect of adult presence is a re-distribution of amount of 
verbalisation over the categories, 


In order to determine the relative importance of the inferred factors of age and 
sex, an analysis of variance using an approximate method of unweighted means was 
chosen, since the design was not balanced. The analysis is thus only approximate. 


Table 4 shows the difference between the presence.and absence of the adult. In 
confirming, adult presence emphasises the age factor (P «0-01); the older girls are 


TABLE 4 


F Ratios FOR CATEGORIES OF VERBALISATION 








Unstructured Structured 
Age Sex Age/Sex Age Sex Age/Sex 
Confirms 2°51 0-75 1:51 4-24%% 0-42 0-42 
Converses 16:61** 2:59 2:52 0:31 5:07** 2:10 
Inquires 1:29 0-90 2:83* 1:10 2-11 2:61 
Informs 5:32** 0-15 0-45 0-68 5.57** 0-70 
Role-Plays 1-06 0:34 2-40 0-68 0-22 2:20 
Negative R. 3:15% 0:29 0-41 3-30* 0:26 0-38 
Positive R. 1-58 0:18 0:74 0-62 0-28 0-68 
Suggests 0-89 0-87 0-42 3.38* 1.19 0-34 
Requests 0-96 1:88 0:13 0:43 116 1-40 
*P<0-05 ** P«0-01 


always prominent but adult presence increases the verbal behaviour of the younger 
girls. 

In conversing there is evidence for contradiction of the traditional idea of girls 
talking more than boys. The older boys converse the most when the adult is not 
present and their loquaciousness helps the age factor to become significant (P « 0-01). 
When, however, the adult is present the significant factor becomes that of sex 
(P«0:01); then the younger boys are the only group to increase their amount of 
conversation. 


In inquiring, a more direct form of verbalisation, the age/sex factor is significant 
(P « 0-05) highlighting the older girls when the adult is not present. In informing, in 
the same situation age alone (P «0-01) is significant. It appears that the older children 
inform the younger but when the adult is present it is the older girls (P « 0:01) 
particularly who continue this activity. It may be that the boys defer more to her 
presence. All groups of children increase their informing possibly because the adult 
is setting a didactic climate and is eliciting information as part of her function. 


In negative reinforcement the age factor is significant (P<0-05) іп both un- 
structured and structured situations. The older children appear to accompany their 
threats, criticisms and hostilities with words, while the younger do not. In suggesting, 
the presence of the adults aids the older, more verbal, children. Age is the significant 
factor (P<0-05). Adult presence appears to affect verbal behaviour both across the 
categories and within the groups. 
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In the adult’s presence, there is a decrease in the number of instances to peers 
and self with a substantial increase of verbalisation to the adult (see Table 5). The 
rank orders of distribution are also changed, the male child receiving most verbalisa- 
tion in the unstructured situation and the adult in the structured. Rank order 
correlation between the two situations is, however, significant (P «0-01). Tests of 
correlated means for all groups show the differences between the two situations to be 
significant (P<0-01). When the children are talking to their peers as individuals 
there is higher incidence in the unstructured situation, the adult drawing the talk 
towards herself in the more formal situation. Tests of correlated means between the 
two situations show significance only for the group of older girls (P<0-01). The 
children talk less to groups than to individuals in both situations, but the difference 
between the two situations is not significant. In talking to self, differences between 
the two situations are not significant for young boys or girls or older girls. Older 
boys, however, can be seen to be directing talk to themselves in the unstructured 


TABLE 5 


PERCENTAGE DISTRIBUTION OF VERBALISATION: BY CHILDREN TALKING TO ADULT, PEERS AND SELF 














Unstructured Structured 
Young Older Young Older Young Older Young Older 
Groups Boys Boys Girls Girls Total Boys Boys Girls Girls Total 
Adults 1-45 274 1:13 >42 7.714 697 10:42 6:46 15:27 39-12 
Female Child 1-73 1-28 217 7-79 12-97 0-50 0-54 0-31 2:73 4-08 
Male Child 2:80 5:82 1:29 412 14-03 157 314 0-41 0-85 5:97 
Group 0-72 1.23 0-34 1-57 3-86 0:50 0-72 0-35 1:19 2:76 
Self 1:10 3°24 0-60 1:35 6-19 1-29 0-88 0-28 0-69 3:14 
No. of instances — — — -- 1428 - — — — 1754 


situation and the difference between the two situations is significant (P < 0-01). This 
finding is of interest, reflecting perhaps the slower growth of sociability among boys 
and their predilection for more objective tasks (Moore, 1967). 


The distribution of the children's verbalisation to the adult is shown in Table 6. 
There is an increase in each category in the structured situation and this is dramatic 
in the confirms and informs categories. The adult promotes the exchange of informa- 
tion and her presence apparently prompts the children to increased questioning. The 
rank order of categories in the two situations shows a positive correlation (P « 0-01), 
suggesting that the exchange of verbalisation within the two situations is one of 
quantity rather than quality when the adult is the main recipient. When, however, 
the order of categories of the children's total verbalisation (Table 3) is ranked with 
that of the adult in both situations, the correlations are not significant (p — 0-25 
Unstructured and p — 0-32 Structured). The qualities of talk are differently 
distributed when the adult is present. 


When the adult becomes the central figure it is undoubtedly in the structured 
situation when she is leading the conversation that she verbalises most (see Table 7) 
(P«0-01). Her presence in the unstructured, being incidental, lowers the raw score 
but it is of interest that the rank order of categories in both situations is highly 
significant (P «0-01). There is, therefore, a high consistency in her verbalising 
procedures whether incidental or structured. 


Over the four main categories of information processing excluding role play, the 
adult is verbalising significantly more in the structured situation (Р <0:01). Sex 
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TABLE 6 


PERCENTAGE OCCURRENCE OF DIFFERENT CATEGORIES OF VERBALISATION BY CHILDREN TALKING TO THE ADULT 















































Unstructured Structured 
Young Older Young Older Older Young Older 

Categories Boys Boys Girls Girls Boys Girls Girls Total 
Information 
processing 

Co 0:23 0-76 0:30 83 3:26 454 477 1514 

Converses 0-07 1:06 0:23 0:91 1:06 0-76 0:91 4:24 

Inquires 0-61 0:38 0:15 0:83 2.73 0-68 2-04 7-80 

Informs 1-59 2:57 1:67 2:05 1531 832 2447 5554 

Role-Plays 0-23 0:30 0-00 0:30 0:30 0-07 1:59 2-26 
No. of instances — — -- — — — — 1122 
Reinforcement 
techniques 

Negative 2:50 5-00 0-00 2:50 12-50 7:50 1000 3125 

Positive 0:00 8:75 1-25 3:75 8:73 5-00 1625 4497 
No. of instances — — — — -- — — 61 
Control techniques 

Suggests 110 3:30 0-00 1-10 5-50 2:20 5.50 15-60 

Requests 7:68 6°60 4-40 6°60 14-31 439 1524 51:50 
No. of instances — — — - -- -- -- 63 

TABLE 7 
PERCENTAGE VERBALISATION OF DIFFERENT CATEGORIES BY THE ADULT 
Unstructured Structured 
Young Older Young Older Older Young Older 

Categories Boys Boys Girls Girls Boys Girls Girls Total 
Information 
processing 

Confirms 0-42 0-84 0:17 0-68 1:42 0-84 2:59 5:94 

Converses 0-00 0:33 0:42 0:17 0:42 1:34 1:84 4:35 

Inquires 1:17 1:84 1-92 1:84 12.17 1213 1217 46:06 

Informs 0-67 0:58 0:33 1:34 8:44 4:93 9:18 27:40 

Role-Plays 0:00 0:00 0-00 0:33 1:00 0-42, 1-59 3-18 
No. of instances — — -- -- -- -- — 1041 
Reinforcement 
techniques 

Negative 0-00 0-00 0:00 0-00 1:42 0-00 0:47 4-73 

Positive 1-43 8-06 1:89 5:21 16:60 20:85 30:84 78-75 
No. of instances — -- — -- -- -- — 176 
Control technique 

Suggests 0:00 1:36 1:97 1:97 1539 1944 2355 6749 

Requests 0:74 1-48 0:37 1:20 6:15 7-02 523 23:34 





No. of instances -- -- — — 
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differences are not significant in either situation but age differences are in the 
structured situation (P «0-01). The adult is talking more with older than younger 
children, and more with older girls than older boys (P<0-01) when she is controlling 
the situation. 

The incidence of adult talk in role play is slight, as in negative reinforcement. 
Verbal positive reinforcement is significantly greater in the structured situation 
(P<0-01). It is apparent that the adult has a favoured technique of control through 


TABLE 8 


PERCENTAGE CATEGORIES OF VERBALISATION BY THE ADULT TO INDIVIDUAL CHILDREN AND TO GROUPS 






































Unstructured Structured 
Individual Group Individual Group 
Categories Boys Girls Boys Girls Total Boys Girls Boys Girls Total 
Information 
processing 
Confirms 1:25 0:84 0-00 0-00 2:09 2:42 334 0-08 0-08 5:92 
Converses 0:33 0-58 0-00 0-00 0-91 1-09 2:92 0:08 0-25 4:34 
Inquires 2:67 3:34 0-33 0:42 6-76 2006 21:54 1:75 277 46:12 
Informs 1:25 1-50 0-00 0-17 2:92 11:45 11:86 1-84 2:25 27.40 
Role-Plays 0-00 0:33 0-00 0-00 0:33 0°83 2:01 0:33 0-00 3317 
No. of instances — — — — 156 — — -- — 1041 
Reinforcement 
technique 
Negative R. 0-00 0-00 0-00 0-00 0-00 427 0-47 0-00 0-00 4-74 
Positive В. 9-47 6-64 9-00 0-48 16:59 2604 48:78 0-94 2:83 78-59 
No. of instances = — — -- 35 — — — — 176 
Control technique 
Suggests 0-98 3-70 0:37 0:25 5:30 2281 4076 1-75 2:23 67:55 
Requests 2:09 1:36 0-12 0-25 3:82 9-97 11:09 1-10 124 23-40 
No. of instances — — — — 74 — -- — — 738 





verbalisation—she suggests rather than requests. Differences are significant in both 
situations (P «0-01). Caldwell ег al. (1970) also found the adult preferring this control 
technique and La Belle and Rust (1973) found the adult using personal motivation 
for control rather than coercive methods. 


Table 8 shows the distribution of the adult's verbalisation to individual children 
and to groups. Неге there is evidence of little verbal interaction with the group, most 
of it being to the individual child. Nevertheless it is of interest that the adult is found 
addressing groups of children, as her treatment of the group might help to strengthen 
its cohesion and promote within the children a feeling of * group mindedness ’. 


Nursery school teachers are conscious of the part they can play in promoting 
feelings of co-operation among children. The comparative increase of verbalisation 
to the group in the structured situation tends to confirm the results of Caldwell et al. 
(1970) who found the adult using ‘ inquiring ’ and ‘ informing’ to older children and 
in group situations. While the teacher in this study had a tendency to structure a 
group setting increasingly with older children, observations here suggest that as the . 
teacher was interacting with the pupil other children joined in and a group was formed. 
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DISCUSSION 


Two 10-minute periods of observation produced a wealth of detail about the 
amount of talking and its distribution over selected categories, and between different 
individuals. In overall amount there was little sex difference, boys talking almost as 
much as girls, but there was a developmental trend. Older children, both boys and 
girls, talked more than the younger children. There were more instances and there 
was a higher mean number of words with a wider range of length of utterance. 


Adult presence, the difference between the unstructured and structured situation, 
was apparently responsible for striking differences. The adult increased the amount 
of talking in all groups, and increased the range of length of utterance. This was 
significantly so in the categories of confirming, inquiring, and informing, thus 
suggesting that the adult viewed her role in that situation as a didactic one. Distribu- 
tion of the qualities of verbalising was altered by adult presence, and factors of age 
(in confirming) and sex (in conversing and informing) were found to be significant. 
The older children, particularly the girls, talked more to the adult, probably because 
they were more conscious of the way she was playing the verbal game. Age, too, 
was shown to be associated with forms of control when talking, both in negative 
reinforcement and suggesting. Possibly the adult model was more imitated by the 
older children as its effectiveness was appreciated. Suggesting was, indeed, the 
favoured form of verbal control by the adult. The role of the adult in promoting 
both quantity and quality of verbal expression thus seems to be clearly demonstrable. 
Another recent study (Wells, 1977) has emphasised the importance of qualitative 
language interaction for subsequent success in school. Teachers in nursery schools 
would thus seem to have a key role to play. More research is needed to highlight 
qualities of language, the categories which need emphasis, as well as strategies to 
promote more interaction. 


When the presence of the adult was incidental and minimal as in the unstructured 
situations the older children indulged in the more casual conversing, but they also 
were the informants. They talked more to the other children in both situations and 
girls tended to talk to girls and boys to boys. Younger boys more readily took the 
opportunity of talking to the adult. The transcripts of younger boys showed the 
dyadic nature of the sequences. Those of the younger girls had more interruptions 
and more inclusions of other children. The emphasis on talking to the individual, 
whether peer or adult, could be seen as a reflection of the social level of  nursery-aged 
children and the social climate of the nursery school. But groups were in evidence in 
both situations. The fact that there was less talking to groups in the structured 
situation tended to suggest that when the adult was present the children were less 
conscious of group structure. Talking to self also decreased in adult presence, except 
in the case of younger boys. The high level incidence of self-talk in older boys 
without adult presence could reflect greater concentration on problem solving 
activities (Moore, 1967) and less tendency for ‘ affiliation’ (Hutt, 1972). The 
traditional superiority of the female in language usage is questioned and lends some 
support to Macaulay (1977). 


. Perhaps the chief educational implication of the study is the necessity to increase 
the linguistic interaction between teacher and children. The transcripts revealed 
firstly, children using language in a very matter-of-fact way to convey information 
(though hardly ever to ask for it), secondly, expressing their fantasy (though this was 
seldom stimulated), and thirdly, communicating their presence. There were few 
instances of trying out new words, of creativity, or of making play with words. The 
qualities of language can, at this stage, only be enriched through adult participation 
where the adult is conscious of the need for enrichment, and has language facility to 
promote expansion and extension of ideas. 
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THE HEN OR THE EGG? WHICH COMES FIRST— 


ANTISOCIAL EMOTIONAL DISORDERS OR READING 
DISABILITY ? 


By PAQUITA McMICHAEL 
(Department of Psychology, Moray House College of Education, Edinburgh) 


SUMMARY. The sequential relationship of antisocial emotional disorders and reading 
difficulties was examined in a longitudinal study of a sample of 198 boys in their first 
two years at school. Teachers completed behaviour questionnaires at school entry and 
at the end of Primary One and Two. The children were tested for reading readiness at 
school entry and for reading performance at the end of Primary One and Two. Anti- 
social deviance was shown to precede later reading difficulties, but was associated at 
school entry with poor performance on the tests of reading readiness. An examination 
of the predictive effects of low performance on reading readiness tests and the reading 
test at the end of Primary One indicated that later antisocial deviance was not related 
at this stage of school to earlier poor performance. These results call into question, for 
this age group, the suggestion that reading backwardness leads to antisocial emotional 
disorders. 
INTRODUCTION 


In 1970 Rutter and Yule renewed interest in a neglected aspect of school failure—the 
association of antisocial, rather than neurotic, types of emotional disorder with 
learning disabilities. This association had been previously noted by the Gluecks 
(1950), Harris (1961) and Douglas and Ross (1968) amongst others and has also 
appeared in surveys by Clark (1970), Sturge (1972) and Davie et al. (1972). The 
specific nature of the relationship has increasingly received attention with questions 
focusing on the aetiology of such a joint disturbance of functioning and on the 
sequence of events. 


The argument that reading disabilities precipitate antisocial behaviour has hinged 
on the rejection by peers and loss of self-esteem which has been observed amongst 
learning disabled children (Mangus, 1950; Chazan, 1963; Campbell, 1964; Gregory, 
1965). Mangus (1950) proposed that loss of self-confidence, together with educational 
neglect and rejection, might lead on to revenge against society. 


The sequential possibilities are: 


(1) that reading disabilities precipitate antisocial behaviour, 

(2) that antisocial types of emotional disorder lead to problems in reading, or 

(3) that both occur together and are dependent on factors outside the schools’ 
immediate control. 


In support of the first possibility Rutter and Yule (1970) reported that Isle of 
Wight children, retarded in reading, had been in difficulty from an early age, some of 
them at the time of the inquiry (aged 9, 10 and 11) still being effectively non-readers. 
If they had been engaged in delinquent activities these had been of later onset than 
their reading problems. They also commented on a tendency (non-significant) for 
the antisocial boys who were retarded in reading to have shown a slightly later onset 
for their antisocial behaviour than those antisocial children whose reading was normal. 
They suggested that antisocial disorders might begin somewhat later when they were 
associated with reading problems. 


Morris (1966) did not make distinctions between the types of disorder associated 
with reading disability. She nevertheless noted in her longitudinal study that the poor 
readers, by the time they reached the last two years of their primary course, were 
showing a larger number of signs of maladjustment and unsettledness than the good 
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readers, being inhibited and anxious, but also aggressive and hostile to adults and 
children. 


Barnett (1972), also finding that the backward readers amongst his sample of 
eight-year-olds were hostile to adults and children, came to the conclusion that their 
hostility was a result of reading failure and indicative of resentment towards the 
teachers and children who create and observe their humiliation. 


Critchley (1968) paid special attention to the age of onset of emotional difficulties 
in retarded children in a London remand home and concluded that only exceptionally 
had factors predisposing to maladjustment affected the children of average intelligence 
before they began instruction in the basic subjects. 


There seems, therefore, to be a substantial body of research, often rather 
dependent on inference, to support the claim that reading disabilities predispose 
children to behave antisocially. 


Rutter et al. (1970) and Sturge (1972) have raised the question of the inheritance 
of reading disability as a means of establishing the primacy of problems in reading or 
antisocial emotional disturbances. A tendency to reading retardation has been 
thought to be inherited in at least some cases (Hallgren, 1950; Hermann, 1959; 
Kolson and Kaluger, 1963; Sturge, 1972; Vernon, 1972). If it is a consequence of 
maturational delays, as in speech development, then also it is likely to precede conduct 
disorders (Rutter, 1970 and 1974). However, this argument is not always accepted. 
Reading retardation is believed by others (Laubenthal, 1938; Blau, 1946; Pond, 
1967) to arise from the same sources of emotional difficulty as antisocial disorders. 
As Sturge (1972) points out, the best evidence of the primacy of reading retardation 
rather than antisocial disorders is to be obtained through longitudinal studies in which 
developmental delays can be established before any sign of socialisation problems 
emerge. Her own study of ten-year-old boys gave a slight indication that develop- 
mental factors played a part in the disturbance of boys from ‘ good’ homes, where 
there were no overt signs of disorder and the home was intact. Where boys from 
such homes became antisocial she suggested * they only manifested antisocial behaviour 
in the situation which caused them most distress—the classroom ". Her observations 
led her, in fact, to divide the children who showed both antisocial disorders and 
reading retardation into two groups, one group who demonstrated reading difficulties 
first and reacted at school against the humiliation they had caused, the other group 
for whom their numerous social disadvantages were enough to account for difficulties 
at school and at home. 


The view that emotional disorders precede reading retardation in a substantial 
number of cases is also a popular one. Burt (1937), Pearson (1952), Malmquist (1958), 
Morris (1959), the Ministry of Education Report (1962), Pond (1967) and Lawrence 
(1971) have subscribed to this view. However, this research has stressed anxiety and 
neurotic responses amongst the children rather more than antisocial behaviour. 
Children's emotional disturbances have been considered consequent upon their 
parents' child-rearing practices and their disordered home lives. Such disturbances 
have been regarded as creating impossible conditions for orderly behaviour in the first 
instance and later constructive work in the classroom. Rutter and Yule (1970) have 
opposed this interpretation having found that two-thirds of the reading retarded 
children in the Isle of Wight survey were not psychiatrically disturbed. For these 
children there were no apparent grounds for alleging emotional factors as productive 
of reading failure. The stringent definitions of reading retardation and emotional 
disorder used by the Isle of Wight survey may have led to this difference of opinion. 
Many previous studies had depended on teachers’ estimates alone or the views of 
single clinicians, whereas this latter survey made use of psychiatric examination as 
well as parental and teacher assessments. 
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Another method of establishing primacy, in default of longitudinal studies, has 
been that of comparing the three groups—the antisocial, the reading retarded and the 
antisocial reading retarded—on a number of behavioural and home background 
variables. If the antisocial reading retarded children were found to resemble one of 
the groups more closely than the other this was taken as evidence of the prior existence 
of this form of disturbance. Rutter and Yule (1970) found that this type of inquiry 
led to the conclusion that reading retardation was primary. Varlaam (1974) was 
largely in agreement. Sturge (1972) dissented, however, finding the children with 
both disorders lay midway between those with only one, not resembling one group 
more than the other. She commented that “ adverse background features are more 
important in reading retardation, particularly when associated with antisocial conduct 
disorders than has hitherto been believed ". Offord and her colleagues in a more 
recent paper on probationers (1978) are largely in agreement, stating that depriving 
family environments appeared equally in the backgrounds of boys whose poor school 
performance preceded their antisocial behaviour and those whose antisocial behaviour 
had been in evidence before school failure. Such environments they concluded 
disposed the child to antisocial behaviour, and if the child's IQ was within the normal 
range, but below average, would also put him at risk of educational retardation. 


Sturge, Rutter and Offord with their colleagues have commented on the need for 
longitudinal studies of pre-school children or children at the outset of their school 
careers to elucidate some of the difficulties in establishing aetiology and the sequence 
of events. This small scale study was designed to meet this need. The results reported 
below are limited to the questions of (a) whether antisocial behaviour accompanies 
problems in reading from the earliest months of school, and (b) whether it is a 
predictor of reading disability. Whether it follows reading difficulties as a response 
to possible loss of social status and self-esteem has been pursued at length elsewhere 
(McMichael, 1978). 

METHOD 
Sample 

198 boys entering the infant departments of eight Edinburgh primary schools 
were selected for behavioural and reading readiness screening. The boys were aged 
between 44 and 54 when they started school in August, 1974. With few exceptions 
they had fathers in manual occupations (nearly two-thirds skilled) and lived in rented 
accommodation in large council estates or in mixed areas near the city centre. Some 
of the families were living in poverty but the majority were in adequate circumstances. 


Screening instruments 

Children's Behaviour Questionnaire (Rutter, 1967)—the СВО. This questionnaire 
was devised by Rutter for the Isle of Wight epidemiological survey. It contains 26 
statements concerning the child's behaviour. The teacher must indicate whether a 
statement ‘ certainly applies’ (score 2), * applies somewhat’ (score 1), or * doesn't 
apply ' (score 0) to the child in question. A score of nine or more is used to describe 
a child as deviant. The items are clustered to produce six sub-scores. In this paper 
the subscores for antisocial conduct and for neurotic symptoms are used (see Note 1). 
Those children whose score is nine or more and whose neurotic score exceeds their 
antisocial score are designated Neurotic and those with an antisocial score exceeding 
the neurotic are designated Antisocial. All children with total deviance scores of 
eight or less are described as Stable. There are, therefore, three CBQ types, the 
Antisocial, the Neurotic and the Stable. It is important to remember, however, that 
the CBQ is merely an index of behaviour problems which appear at school. It is not 
a psychiatric diagnostic instrument. 


Thackray Reading Readiness Profiles (1974). This set of group tests measures 
(i) vocabulary, (ii) auditory discrimination, and (iii) visual discrimination. A last 
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test, based on the Harris-Goodenough Draw-a-Man Test, requires the child to draw 
Mother. Predictive validity for the profiles is based on Pearson product-moment 
correlations with reading test performance a year later. The coefficients range from 
0-45 to 0:58. 


The CBQ was issued to the teachers and completed in the tenth week of the first 
term of Primary One. The Reading Readiness Profiles were administered between 
the tenth and twelfth weeks of the same term. 


Criterion measures 

The Children's Behaviour Questionnaire was completed on two more occasions— 
at the end of the first and at the end of the second years of primary school, when the 
children were aged 5:9 and 6:9 years respectively. 


The Southgate Tests of Word Selection (1958) and Sentence Completion (1962) 
formed additional criterion measures. Form A of the first test was administered to 
class groups at the end of their first year at school. At the end of their second year 
they were tested on Form C of the word selection test and also on Form B of the 
Sentence Completion Test. 


Children were described as * poor readers’ at the end of the first year in primary 
school if they gained a score of six or less on the Southgate Word Selection Test 
Form A. A score of six could be gained purely by chance on this 30-item test with 
five choices for each item. At the end of the second year ‘ poor readers ’ were those 
children who had scored 12 or less on Form C of the same test. This score was taken 
as a cut-off point since it had been used to select the backward readers of the National 
Child Development Study (Davie et al., 1972) who were only slightly older than the 
children of the Edinburgh sample. 


The final criterion measure was the Columbia Mental Maturity Scale (Burge- 
meister et al., 1972) —the CMMS. This individual test estimating general reasoning 
ability requires no verbal response and a minimum of motor response. The child 
must select from a series of drawings presented together on a large card the one which 
does not belong. The test acts in this study as a measure of intelligence ° as demon- 
strated in concept attainment abilities. 


RESULTS 


The Relative Prevalence of Antisocial and Neurotic Deviance 

The CBQ's results on the three occasions on which it was completed by class 
teachers indicate that antisocial types of deviance are consistently in the majority. 
This might at first glance appear to be a function of adaptation to school in the early 
years, but a comparison with the results from Camberwell (Sturge, 1972) reveals that 
amongst ten-year-old boys the percentages judged deviant in this way are remarkably 
similar. These figures, referring to children from areas with largely working class 
populations may not, of course, be representative of other parts of the UK. More- 
over, they should not be compared with those thorough-going assessments of 
psychiatric disorder where a proportion of one child in 20 is commonly found to be 
significantly disturbed. 


Reading Backwardness during the First Two Years in Primary School 

. _At the end of their first year at school 36 (19-7 per cent) of the 183 children left 
in the sample received scores between 1 and 6, and were described as poor readers. 
At this stage in a school career these scores would give a somewhat too liberal meaning 
to the term * reading difficulty ', but at the end of the second year the proportion with 
low scores (12 or less) had dropped to 15-0 per cent (М = 25) of the remaining 167, 
close to the proportion of backward readers (13-7) found amongst the boys of the 
National Child Development Study (Davie et al., 1972). 

B 
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TABLE 1 
PREVALENCE OF Boys’ STABILITY AND DEVIANCE (ANTISOCIAL AND NEUROTIC) IN EDINBURGH AND 
CAMBERWELL 
CBQ types 
Antisocial Neurotic Stable Totals 
Samples N % N % N % N % 
Edinburgh-—school entry 
Mean age 5-2 years 33 167 15 76 150 757 198 100 
Edinburgh—end of first year 
Mean age 5-9 years 44 239 12 6:5 128 696 184 100 
Edinburgh—end of second year 
Mean age 6:9 years 30 178 6 3:6 132 786 168 100 
Camberwell—age 10 years п 16:8 51 77 497 755 659 100 
TABLE 2 


THE RELATIONSHIP OF BEHAVIOUR TO READING PERFORMANCE AT THE END OF FIRST AND SECOND YEARS 














Reading ability 
End of first year End of second year 
Fair to good Fair to good 
Poor readers readers Totals Poor readers readers Totals 


Hm ————— c —M—— —— ———Á—— ЛИ 





Antisocial 18 50.0 26 17-7 44 240 10 40-0 20 14-1 30 18-0 
Neurotic 4 11-1 8 5:4 12 66 4 16-0 2 1-4 6 3-6 
Stable 14 389 113 769 127 69-4 1 440 120 845 132 78-4 


Totals 36 1000 147 1000 183 1000 25 1000 142 1000 167 1000 


The Prevalence of Reading Difficulties Together with Behavioural Deviance 

Table 2 shows that reading difficulties are more apt to be associated with antisocial 
than with neurotic behaviour, even when the children are still in the infant departments 
of their schools. (In each section of the table the reading and behaviour measures are 
contemporary.) 


The overlap between reading difficulties and antisocial behaviour in this sample 
is of the order of 33 to 49 per cent. The proportions of antisocial children amongst 
the poor readers in Primary One and in Primary Two were rather higher than those 
found by Clark (1970) Rutter and Yule (1970), and Sturge (1972) who reported 
frequencies of approximately one-third. The Edinburgh poor readers, as a proportion 
of the Antisocial, were 40-9 per cent at the end of Primary One and 33:3 per cent a 
year later. These percentages were less than the 49 per cent found by Sturge in the 
comparable district of Camberwell, though similar to the proportion of a third 
reported in the Isle of Wight. It would seem that the overlap between reading 
difficulties and antisocial behaviour typically concerns one third to a half of the 
reading retarded and similarly a third to a half of the antisocial. 


Antisocial Behaviour as a Precursor of Reading Difficulties 
An antisocial form of classroom deviance would appear to precede later reading 
difficulties in a considerable number of cases. Table 3 below points to the association 
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between antisocial deviance at school entry and reading difficulties at the end of each 
of the first two years in primary school. 


Although the result is significant, attention must, however, be drawn to the fact 
that the majority of the Antisocial children are not poor readers, in either of these 
two years. Many children can learn to read adequately in spite of considerable 
emotional disturbance noted at school. Nevertheless, their performance as a group 
is significantly worse than that of the stable children. An analysis of variance shows 
highly significant differences between the three groups of children, with the Antisocial 
consistently the poorest performers on all tests, whether administered at the end of 
Primary One or Two. 


TABLE 3 


THE SEQUENTIAL RELATIONSHIP OF BEHAVIOUR AND READING PERFORMANCE 


Reading at end of first year 
































Fair to good 
Poor readers readers Totals 
CBQ types at 
school entry N % N % N % 
Antisocial 13 361 18 123 31 16-9 
Neurotic 5 13-9 8 5:4 13 71 
Stable 18 50-0 121 823 139 76-0 
Totals 36 1000 147 1000 183 1000 
Reading at end of second year 
Fair to good 
Poor readers readers Totals 
CBQ types at ------ ------- ----- 
end of first year N % N y N 76 
Antisocial 12 48-0 29 20-4. 41 245 
Neurotic 4 16:0 6 42 10 6:0 
Stable 9 360 107 754 116 69:5 
Totals 25 1000 142 1000 167 1000 
TABLE 4 


ANALYSES OF VARIANCE OF READING PERFORMANCE BY BEHAVIOURAL TYPES 











CBQ types at school entry 
Southgate reading tests Antisocial Neurotic Stable F 

End of first year (Form 1A) 

Mean 8:13 9:85 14-09 12:73 

SD 4:30 600° 671 P<0-001 
End of second year (Form 1C) 

Mean 16:03 19:60 23:35 15:33 

SD 6°65 6:28 652 Р<0%001 
End of second year (Form 2B) 

Mean 2:07 4:50 12-22 13:24 

SD 4-28 4:20 11:53 P<0-001 





232 Emotional Disorders and Reading Disability 


Although emotional disorder of an antisocial kind is shown to precede later 
reading difficulties, a simple cause and effect model for the association is inappropriate. 
The children found to be Antisocial at school entry performed very much worse on 
the Thackray Profiles than the stable children, but only slightly worse than those 
dubbed Neurotic. 


Although the relationship between emotional disorders of an antisocial variety 
and reading disabilities is the specific concern of this paper the equally clear association 
between neurotic forms of disturbance and poor reading performance should be 
observed. In Table 3 we find that the Neurotic children are at least as likely to be 
poor readers as the Antisocial. In Tables 4 and 5 significant differences between the 
Neurotic and the Stable children exist on the visual and vocabulary profile scores and 
in reading performance on Form 1A and Form 2B of the Southgate Tests, whereas 
on no test do the scores of the Neurotic and Antisocial children differ significantly. 
These results are intriguing in so far as they run counter to the work reported by 
Rutter and Yule (1970). They are not, however, based on sufficient numbers to 
challenge seriously the generality of the earlier findings, drawn, as they were, from a 
larger and older age group. 


A regression analysis indicated that once the contribution of the Profiles to the 
variance on the reading tests had been made, antisocial behaviour as such had little 
to add. 

TABLE 5 


ANALYSES OF VARIANCE OF THACKRAY PROFILE PERFORMANCE BY BEHAVIOURAL TYPES 














СВО types at school entry 

Thackray profiles Antisocial Neurotic Stable F 
Visual discrimination 

Mean 6:39 8-65 12:15 10-13 

SD 541 6:47 7:15 P<0-01 
Auditory discrimination 

Mean 3-26 3-59 6:09 8-79 

SD 2:31 2:96 224 P«001 
Vocabulary 

Mean 6:87 8:18 12:07 22:25 

SD 3.77 475 439 P<0-001 
Draw mother 

Mean 7:32 8-00 10:05 9-97 

SD 3-48 3-22 336 P<0-01 

TABLE 6 


MULTIPLE REGRESSION ANALYSES OF THE CONTRIBUTION OF THACKRAY PROFILE PERFORMANCE AND 
ANTISOCIAL BEHAVIOUR (at school entry) TO READING PERFORMANCE 





Multiple r Reading Test Multiple r Reading Test 
Variables (Form 1A) at end of first year (Form 1C) at end of second year 
Thackray Profiles 
Visual discrimination 0-48 0-40 
Auditory discrimination 0-54 0-48 
Draw mother 0-57 0:54 
Vocabulary 0:58 0:59 


CBQ Antisocial Behaviour 0:60 0-61 
& 


2 
i, 





E 
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Emotional disorder can thus be seen not only to precede, and to some extent 
predict, later reading difficulty but also to be accompanied by poor performance on 
tests which are specifically designed to give some indication of how children will 
approach reading. Emotional disturbance, whether of a neurotic or antisocial nature, 
is clearly related to poor performance on reading readiness tests, though the associ- 
ation with antisocial conduct is somewhat closer. Antisocial disorders may therefore 
appear to lead to poor reading test performance, but already at school entry they are 
associated with low competence in the very skills which contribute to later reading 
Success. 


So far it has been established that an antisocial behaviour disorder may precede 
reading difficulties though not necessarily produce them. In the next section the 
situation is reversed. 


Reading Difficulties as a Cause of Antisocial Behaviour 

Three groups of children were formed from those still in the sample at the end of 
Primary One. One group consisted of the boys whose antisocial score had lessened 
between the first and second completion of the CBQ (at the beginning and end of the 
first year in Primary One), the second group consisted of those whose antisocial score 
had stayed the same, and the third group was composed of the children whose anti- 
social score had increased by the end of the year. The same procedure was followed 
for the second year at school. These three groups were then examined to see whether 
their improvement or deterioration in antisocial score was related to their Thackray 
Profile and reading test performances. It was assumed that (a) if children were 
reacting to school difficulties by becoming antisocial then low scores on the Thackray 
Profiles would be likely to predict an increased antisocial score by the end of Primary 
One and (5) low scores on the reading tests in Primary One would predict an increased 
antisocial score by the end of Primary Two. 

During their first year at school, children who entered Primary One with the 
perceptual and language immaturity indicated by low scores on the Thackray Profiles 
were expected to become aware of being slower to grasp the intricacies of the reading 
process than their classmates. Their awareness and resentment of the situation, if it 
existed, should be apparent in increased antisocial behaviour. Similarly, between the 
end of the first and second years those children whose score on the first set of reading 
tests had been low might be expected to react to their performance and resulting low 
status in the class by deteriorating behaviour. 

Tables 7 and 8 compare the changes in antisocial behaviour between school entry, 
the end of the first and the end of the second years of primary school. 


TABLE 7 
THE ASSOCIATION OF CHANGES IN ANTISOCIAL SCORES WITH THACKRAY PROFILE SCORES 
Antisocial behaviour-—end of first year 


Thackray profile Improved Unchanged ^ Deteriorated 





Scores M 
School entry N % N 9 N % Totals 
Low scores* 3 44 10 10-9 9 39-1 22 
(Score 0-6) 
High scores 65 956 82 89.1 14 60-9 161 
(Зсоге 7-20) 
Totals 68 1000 92 1000 23 100-0 183 
ж See Note 2. А 
x? = 19:86 df 2, P «0-001 ae с 
бс 
2” 
i 
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TABLE 8 


THE ASSOCIATION OF CHANGES IN ANTISOCIAL SCORES WITH READING PERFORMANCE 


Antisocial behaviour—end of second year 


Reading Improved Unchanged | Deteriorated 
performance 
End of first year N % N % N % Totals 


Low scores 10 31:2 9 111 13 24-1 32 
(Score 0-6) 

High scores 22 688 72 889 41 75-9 135 
(Score 7-30) 


"Totals 32 19000 381 1000 54 1000 167 

















x? = 7-44, df 2, P<0-05 


Poor performance on the Thackray Profiles predicted with some success a 
continuing or increased level of antisocial behaviour over the subsequent year at 
school. Only three of the low scorers on the Profiles improved, the other 19 growing 
worse or remaining unchanged. 


At first glance it would appear that the picture remains the same when the 
children move on to the following year and are well launched into reading. It would 
seem that there is a continuity between poor ‘ readiness ' and low reading attainment 
and that both have an adverse effect on behaviour. Table 8 again reports a significant 
result. A close look at the table, however, shows that the significance of the difference 
between the high and low scorers is caused by the percentage ‘ unchanged’ rather 
than by the percentages ‘improved’ or ‘ deteriorated’. In fact, when the centre 
column of unchanged children is removed from the analysis the result is non-significant 
(x2 = 0:51, 1 df, NS). 


Further caution must be exercised over the question of continuity, since only 
seven of the 22 children who were rated ‘ low scorers’ on the Thackray Profiles were 
rated ‘ low scorers ° on the reading test administered a year later. The behaviour of 
two of these actually improved over the intervening year. 


A last caveat: raising the cut-off point of the Thackray Profiles by one mark 
served to turn the highly significant result presented here into a non-significant one. 
Twenty-one more children were included amongst the low scorers by this adjustment. 
Eighteen of these improved over the following year. 


DISCUSSION 


This study has shown that antisocial behaviour is apparent in schools from the 
first months of primary school, and that it is much more in evidence than neurotic 
deviance. As in the previous work by Rutter and Yule (1970), Sturge (1972) and 
Varlaam (1974) emotional disorder has been shown to accompany reading disabilities. 
Amongst these infant school children both neurotic and antisocial types of deviance 
showed this association, whereas amongst older children the earlier studies have 
indicated that it is specifically antisocial disorders which accompany reading diffi- 
culties. The results reported here concern small numbers but indicate that when 
emotional disturbance is brought into the infant department at the start of school life 
then whether its presenting symptoms are neurotic or antisocial they interfere with 
learning. 


Comparison with the work of the authors cited above is not directly possible. 
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The children in this study were defined as emotionally deviant on the basis of teachers’ 
questionnaire results alone. This was largely true for the researches of Sturge and 
Varlaam, but in the Isle of Wight emotional disorders were diagnosed by psychiatrists 
and attested by both teachers and parents. Furthermore Rutter and Yule, and also 
Sturge, had a measure of reading retardation which was unobtainable for the younger 
children of this study. They considered children retarded if they received lower scores 
than might be expected on the basis of their chronological age and their intelligence. 
In this study the children's performance was judged not in relation to their intelligence 
but only to their age, within the range of the 12 months in their class. Use of the 
CMMS as a measure of intelligence made it possible to show that all but one of the 
poor readers of this study were in fact ‘ retarded ’ (McMichael, 1978), but at the date 
of the inquiry the degree of their retardation could not be compared with the earlier 
studies which established retardation only when the child was 28 months (Rutter and 
Yule, 1970) or 24 months (Sturge, 1972) behind the score expected for his age and 
intelligence. The Edinburgh children had been at school for less than 24 months 
when the research ended. 


To overcome this difficulty the children of this project should have been followed 
up for two or three more years. However, research by Clark (1970) in Dunbartonshire 
would suggest that by this time few of the children (1 per cent with at least one WISC 
score of 90 or above) would be severely handicapped in reading. The antisocial 
behaviour of the children in the second year with dual problems might well continue 
but their reading difficulties are likely to have abated. Such is the success reported 
for Scottish teaching of reading (Davie et al., 1972; McMichael, 1978) in the earlier 
years at school that it seems likely that the incidence of the dual problems of antisocial 
conduct disorders and reading retardation in the primary schools may be lower in 
Scotland than in the Isle of Wight or London. Where these problems occur to- 
gether in older Scottish children it may well be because the schools can combat the 
many disadvantages these children suffer only for the first few years, their nurtur- 
ance and teaching becoming less effective as neighbourhood and peer influences 
gain ascendancy. 


The results of the investigation in the preceding section would not support the 
contention that antisocial behaviour in school is necessarily the outcome of children’s 
reading difficulties and their problems of school adaptation. The direction of effects 
is obscured by the fact that over half (13) of the 22 boys entering school with limited 
cognitive skills gave contemporaneous evidence of aggression, destructiveness and 
disobedience. However it was not only these children who demonstrated deteriorating 
behaviour over the year. Amongst the nine low scorers on the Readiness Profiles 
whose behaviour grew worse there were equal proportions of the previously well 
behaved and the previously antisocial. No great weight should be attached to this 
one significant finding since it is not sustained in the following year when the focus 
shifts from pre-reading skills to reading itself. 


It may be that at this age (6 years 9 months on average at the end of second year 
children only rarely respond with antisocial behaviour to frustration and disappoint- 
ment in school. Difficulties at school may not be perceived as disappointments until 
children can fully understand the pressure of parental expectations of their per- 
formance. Though infant school children are aware of being behind their classmates 
in the book they are reading, this may seem of less importance to them than it does 
to their parents. Even if they have internalised parental ambitions, failure to realise 
these ambitions may not lead to lying, disobedience, theft or aggression. In fact, 
what may be true at the age of 9 or older may still be latent at the age of 7. 


Other research on this group of children has shown that the children who were 
both antisocial and in difficulties with their reading at the end of the first and second 
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years had actually entered school often notably antisocial but also with significantly 
poorer self-concepts than their classmates (McMichael, 1977). These self-concepts 
were related to their competence in the Thackray Profiles and the Columbia Mental 
Maturity Scale. It wouid therefore seem that the reading failure had not contributed 
to the initial existence of low self-esteem, though in due course it might lead to yet 
further loss of confidence. The children who manifested antisocial behaviour and 
reading difficulties at the age of six to seven appeared to have entered school with a 
constellation of earlier problems connected with delayed linguistic, perceptual and 
cognitive development, low self-esteem and antisocial behaviour. The antisocial 
behaviour had not arisen from loss of self-esteem through reading failure so much as 
accompanied low self-esteem into school. 


Another strand of the argument supporting the development of antisocial behav- 
iours in response to school failure suggests that rejection by schoolfellows, as a result 
of poor school performance, leads to hostile and even delinquent responses in the poor 
performer. Little evidence amongst these young children was found for such a response 
(McMichael, 1978). Children with reading difficulties in the second year of primary 
school were not as a whole group significantly more rejected or unpopular than 
those who read adequately or well. However, when they were subdivided into the anti- 
social and the stable, differences emerged. Antisocial poor readers were very much 
more likely to be aggressive, and disliked for it, than were other groups. The stable 
poor readers were no more rejected (or aggressive) than the stable fair to good readers. 
The degree of stability was the crucial differentiating factor. These results suggest 
that rejection of a poor performer amongst children of this age is a consequence 
less of performance than of behaviour. Since poor behaviour was apparent at school 
entry, social status was likely to have been low from the outset rather than to have 
decreased when inability to cope with the academic requirements of the infant school 
was demonstrated. 


These results do not undermine the view that poor school performance leads to 
antisocial behaviour in children in later school years. They do, however, point quite 
clearly to the existence of a group of young children who seem to bring to school 
behavioural and cognitive disadvantages which affect their self-perceptions, the 
perceptions of them by others and their reading development. 


This study of young children has only been able to establish that the antisocial 
backward readers in the second year of primary school have not developed their 
antisocial deviance in response to their reading disability nor vice versa. Behavioural 
deviance noted by the teachers soon after school entry was associated with poor 
performance on the Reading Readiness Profiles administered at the same time. This 
fact, together with the findings on the early presence of low self-esteem and rejection 
by classmates, not only provides an argument for the genesis of many reading and 
behavioural problems outside the school but hints at the need to establish, in the 
child’s environment, his temperament or his delayed development other preconditions 
for these functional disturbances. 


NOTES 


1. Antisocial and Neurotic scores are derived from the following items: 

Antisocial—often destroys own or other’s belongings; frequently fights with other children; 
is often disobedient; often tells lies; has stolen things on more than one occasion; bullies 
other children. 

Neurotic—often worried, worries about many things; often appears miserable, unhappy, 
tearful or distressed; has had tears on arrival at school or has refused to come into the 
building this year; tends to be fearful or afraid of new things or new situations. 

2. Thackray Profile subscores were combined to create a composite score for the series of tests. Raw 
scores on each Profile could not be summed to gain a total score so they were converted into 
Thackray’s standardised grading from A to E. These grades were, in turn, reconverted into a 
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numerical value, ranging from five for an A down to one for an E. The Profiles’ new numerical 
Scores could then be summed to create a Reading Readiness Score with a range of 4to 20. Twenty- 
two of the boys (12 per cent of 183) gained scores between four and six. Another 21 received 
scores of seven. A cut-off point of six was chosen to denote * low-scorers ° because no children 
scoring at this level attained a C (or average) grade in any of the Profiles. With a score of seven 
most children gained at least one C. This seemed enough to rule out that individual as an all-round 
‘low scorer’. Furthermore, a cut-off point which selected 12 per cent of the children resembled 
that chosen by the National Child Development Study for their test of 7-year-olds' reading 
attainment. 
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CULTURAL CHANGE AND MATHEMATICAL THOUGHT 


By R. TANNER 
(Preston Polytechnic, Poulton Campus) 


AND E. ANNE TROWN 
(University of Lancaster) 


Summary. Sixty children of Indian, Pakistani or Bangladeshi origin who had spent all 
their school lives in England, 60 who had arrived within the past three years and 60 
British children were randomly selected in two age groups (10/11 years and 12/13 years) 
from the pupils of 22 schools in 11 towns in the industrial north-west of England. 
Individual mathematical tasks called for ‘ relational thinking ' rather than instrumental 
technique. No significant difference was found in the ability to abstract the nature of a 
pattern, to use this abstraction in the face of pattern disturbance or in the ability to 
generalise. The ability to employ hypotheses in the formulation of a number concept 
was significantly affected by age. It was only when overt verbal reasoning in English 
was required that length of exposure to host culture proved to be a significant factor. 
An interaction between length of stay and sex had a strong effect upon the quality of 
hypothesis orientated questioning in the English language. Asian girls who had 
completed a full British primary education showed. marked superiority over their male 
counterparts at age 10/11 and marginal superiority at 12/13. 


INTRODUCTION 


RESEARCH into the abilities of immigrant children in Britain has tended to concentrate 
upon patterns of specific differences between immigrant and indigenous groups, a 
main concern being the effect of degree of exposure to British schooling. Dosanjh 
(1968) found a positive attitude towards mathematics among Punjabi immigrant 
pupils though there is some doubt as to how far this extended beyond arithmetic. 
Like Sharma (1971) he found a positive relationship between performance and lengths 
of stay and schooling and also superior performance among boys compared with girls. 
Ashby et al. (1970) investigated the abilities and attainments of Indian and Pakistani 
children of both sexes in relation to experience of host culture. They found no 
significant differences between the long stay (> 9 years) group and native Scottish 
children on all tests, the medium stay group being significantly poorer on verbal 
reasoning and Raven's Matrices and the short stay («3 years) group significantly 
poorer on all tests. Immigrant boys were significantly superior to immigrant girls on 
all measures with long stay boys scoring significantly better than native Scottish 
children on teacher-rated formal and problem arithmetic procedures. 


In Britain the importance of structure and understanding has been stressed by 
secondary mathematics project teams who have strongly influenced examination 
syllabuses and text book design (Mathematical Association, 1976). Curriculum 
reformers in primary mathematics have advocated learning from experience and play 
and by discovery (Dienes, 1960). Opposition to these trends (Hammersley, 1968) has, 
however, culminated in widespread pressure for a return to traditional methods. 


Their interpretations of the views of educational psychologists have Jed mathe- 
matics educators towards an emphasis on ‘process’ and away from ‘product’, 
toward meaningful learning and away from reliance on memory. Skemp (1976), in 
a cognitivist approach, has drawn attention to the limitations and the isolated context 
of rote learning. He believes that “ there are two effectively different subjects being 
taught under the same name ' mathematics' " (p. 22). The distinction, between 
teaching for instrumental memory-based understanding and teaching towards the 
adaptive schemas of relational understanding, is in his view far more important than 
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that between ‘ modern’ and ‘ traditional’ content each of which can be taught in 
instrumental or in relational fashion. 


The immigrant child must contend with a new culture and a new language as 
well as learning situations which may be far removed from his previous experiences. 
He may be protected from confusions by being confined to ‘ straightforward arith- 
metic ' instrumentally taught but then doubts arise as to how far he is being equipped 
for the changing needs of society. Surveys of studies among immigrant pupils in 
Britain (Goldman and Taylor 1965, Taylor 1974) show that, while attitudes towards 
and abilities in arithmetic have frequently been measured, there has been little 
investigation into potential for full mathematical thought. This is the purpose of the 
research reported here. 


METHOD 
Materials 


The decision to use attribute materials was based on the following considerations: 


(a) They provide a concrete/visual basis for relational thought. 

(b) They seemed to offer the least danger of cultural bias. 

(c) They had motivated immigrant and indigenous groups in pilot work in schools. 

(d) They were considered suitable by the teachers and mathematics advisers 
consulted. 

(e) They were readily available and easily transportable from school to school. 

(f) They had been used successfully in many previous investigations (е.р., 
Vygotsky, 1962; Bruner et al., 1956). 


The attributes, selected after a series of preliminary trials in schools, were the 
unordered ones of colour (red/blue) and shape (square, triangle, circle) and the 
ordered number set one, two, three. All possible combinations (e.g., two blue 
triangles) were displayed on a series of 18 cards. At the beginning of each learning 
session a check was made that the individual pupil concerned could readily identify 
and draw, in appropriate colour, what he saw on the cards and that he was familiar 
with the English words for the attributes. 


It was recognised that it is impossible to avoid all cultural bias. The coding 
function of language, the occurrence and role of the attribute in everyday life, all 
affect response. Cole et al. (1971), who used attribute materials to study mathematical 
thought among the Kpelle tribe of West Africa, point to the dangers of assuming any 
test, whether based on theories of cognitive development or models of the structure of 
intelligence, will evoke the same behaviour among different cultural groups. As one 
mathematics adviser pointed out, ** Even putting a pencil in a child's hand introduces 
a cultural factor ”, 


Activities 

A term's preliminary work was undertaken in schools among children as similar 
as possible to those who would be taking part in the investigation. The aim was to 
develop learning sequences which would reveal reasoning strategy but make only 
limited demands upon overt use of language by the pupil. No emphasis was to be 
placed on memory or on the reproduction of rote learned facts and techniques. It 
was decided not to impose a time limit on the teaching/testing sequence so that a 
pupil-teacher relationship could develop between child and tester. The tests were 
planned to engender confidence and enjoyment so that the effects of anxiety in this 
initial contact with a stranger would be minimised. The activities were conducted, 
always in the same sequence, one child at a time in the child's own school environment. 
They took from 30 minutes to one hour according to the individual. Since the prime 
objective was to help the teacher in the classroom it was decided to permit him to 
observe if he wished, especially as there were no signs of pupil anxiety due to teacher 
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presence during pilot work. For immigrant pupils an interpreter, usually a member 
of staff or older child, was on hand to ensure that the instructions given orally by the 
tester were understood. 


One influence upon the final version of the activities was Concept 7-9 (Schools 
Council, 1972), a course developed over a five-year period to encourage acquisition 
and command of language among children from widely differing social backgrounds. 
Another was the Mathematics Workshop television series (BBC, 1976). 


The first activity, ‘ Missing Pictures’, was designed to minimise the use of 
external language, to build up interest in the attribute materials, to allay anxiety and 
to establish the setting for the learning sequences to follow. In a familiarisation 
stage the 18 attribute cards were arranged by the experimenter in a matrix of three 
rows and six columns. All the cards with blue shapes on them were placed in a 3-row 
and 3-column matrix to the left of the arrangement and all the cards with red shapes 
on them in a 3-row and a 3-column matrix to the right of the arrangement. The blue 
matrix and the red matrix had identical internal arrangements, e.g., circles were placed 
along the first row in the order, one circle, two circles, three circles; triangles along 
the second row and squares along the third row in the same order. 


The game was that the child was asked to look away while the experimenter 
turned over one of the 18 displayed cards (selected in accordance with a pre-determined 
random sequence). The child, who had been provided with two ball point pens, one 
* obviously blue ' and the other ‘ obviously гей”, attempted to draw what he thought 
was on the card that had been turned over. Care was taken to ensure that the pens 
were replaced on the table after each turn in order to avoid the possibility of the 
automatic use of the pen remaining in the hand after a previous attempt. Encourage- 
ment was given in the form ‘ Look at the pattern’ after failure. The activity was 
complete after three successful attempts or a total of 20 attempts, whichever came 
first, the number of errors being recorded. 


The ‘ blue on the left/red on the right’ pattern and the one, two, three order 
along each row of the matrix were then disturbed according to three pre-determined 
random rearrangements although, for the moment, the shapes were kept in the 
original rows. In a third stage the three predetermined rearrangements of the matrix 
were completely random though the 6 by 3 formation was retained. The ‘ missing 
picture’ game was played, as before, at each stage. 


A second activity, ‘Word Meanings’, was adapted from the approach of 
Vygotsky (1962). It was decided to form nonsensical verbal labels from a Greek 
symbol set unfamiliar to both immigrant and indigenous groups. The perceptually 
dominant colour dichotomy was chosen for the first stage so as to encourage confidence 
in this activity. The “ words” ped (to represent red) and ВЛ» (to represent blue) 
were written on separate naming cards. The child was told that these were words in 
a new language and that each of the 18 attribute cards had one or other name written 
on the back. The attribute card with two red squares on it was shown to have раб 
written on the back and then placed, attribute side up, alongside the ред naming 
card. The child was asked to group all the attribute cards alongside one or other of 
the naming cards in the manner he thought to be correct. The tester then turned the 
attribute cards over until a wrong one was found in each category. The remaining 
cards were not looked at. Correct cards were placed attribute side up again and the 
wrong cards were put in correct categories. Attempts, which were all recorded, 
continued until all the cards were placed correctly. ‘Then six extra attribute cards, 
which had been designed to test for ability to generalise, were introduced. The child 
was asked to place each one in the appropriate name group. Two carried extended 
Shape (hearts and crosses), two extended number (four and five) and two both 
extended shape and number. 
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A second stage of the ‘ word meaning’ activity involved a more complex number 
concept. The child was told clearly that the words were quite different in meaning 
this time and did not refer to red and blue. The * word’ 868 was used for cards with 
one or three shapes on them and vee for those with two shapes. The card with two 
red squares was this time shown to have vee on the back and placed alongside the vee 
naming card. The game proceeded as before until all cards were correctly placed. 
Again, nine extra attribute cards were introduced. Three carried extended shape, 
two extended colour (green) and one extended colour and shape. "Three carried 
extended number (four and five) to test what meanings (e.g., * two/not two’, * odd/ 
even’, etc.) had been given to the nonsense words. 


The third ‘ Questioning’ activity called for logical reasoning with limited 
employment of the English language and was influenced by the research of Bruner 
et al. (1956) and Unit 2 of Concept 7-9. Throughout, the matrix from the ‘ missing 
pictures? game was displayed in its original ordered pattern. The experimenter 
selected a card at random from a set which had been shown to be identical to the set 
of cards on view. The face of the card was hidden from the subject who was asked 
by the experimenter, and again for Asian pupils in mother tongue by the interpreter, 
to discover what was on the card by asking questions in English. He was given the 
following examples of possible questions: ‘‘Is it one? ", “ 15 it one triangle? ”, 
** 15 it two blue triangles? ”, ** Is it triangles? ". Не was told that only ‘ Yes’ and 
* No' answers would be given and that he should use as few questions as possible. 
The first round of the game was treated as a practice. In general two further rounds 
proved sufficient to reveal the strategy of the child but where doubt existed the game 
went on, all attempts being recorded. Two girls in the short stay groups were unable 
to complete this activity because of inadequate or insecure English. 


Design 

To control variations within the samples it would have been desirable to confine 
the immigrant group to children originating from one limited area, e.g., Indian 
Punjabi Sikhs. It was decided, however, to include pupils from any country of origin 
in the Indian subcontinent, including those whose families had arrived via Africa. 
This was so that the teachers who had been consulted would be provided with the 
type of general information they had said would be most useful to them. The term 
* Asian immigrant pupil? in this context is used to refer to school children, whether 
born in Britain or not, whose parents were of Indian, Pakistani or Bangladeshi origin. 


The enquiry was conducted in the industrial towns of the north-west of England. 
It was decided to avoid cities with above 100,000 population because of their special 
educational problems. Eleven towns with a sizeable Asian immigrant community 
were included. It is not possible to identify these because of guarantees of confi- 
dentiality that were given. Nine primary schools and 13 secondary schools, with 
Asian immigrants on roll, were chosen in consultation with LEA. advisers. 


Three categories of exposure to English culture were defined: an Asian ‘ short 
stay ° group (less than three years in the English school system); an Asian ‘ long 
stay’ group (children who had been right through the English school system from 
5 or 6 years of age) and an ‘ indigenous control’ group (children born in Britain to 
families of British origin and educated here). Of the 180 children who took part, 15 
were in each of the 12 groups created by sex (boy/girl) and age (10/11 or 12/13) 
dichotomies in combination with the three levels of exposure to host culture. 


Some schools put forward the idea of matching for ability or intelligence but 
this was rejected in the absence of a culture fair test because the use of school-based 
criteria would have carried the risk of a control group biased towards low ability 
English children. Although apparently well-motivated for success, many of the Asian 
children had drifted into lower streams because of language problems. It was decided 
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therefore to rely upon random selection from the children available for each group 
within each school and the control groups contained boys and girls regarded by their 
teachers as representing the complete spectrum of ability. Nor was there any attempt 
to match for socio-economic background since status in Britain need not reflect that 
in the country of origin. 


In each school one or two children (and in two particularly large secondary 
schools, three children) were randomly chosen at each level of exposure in each age 
group. In a few cases the number of Asian boys or girls available for selection in a 
particular age group in a particular school dropped below five. Since the final sample 
had been spread over 22 schools in order to control the effects of educational philo- 
sophy and teaching method, there was no reason, however, to believe that the final 
samples were unrepresentative of the pupils of the north-west of England. 


RESULTS 


All children were ultimately successful in the familiarisation phase of the 
* Missing Pictures’ activity. 125 children, among them 12 of the 30 ‘ short stay’ 
10/11 year olds, recognised and used the original ordered matrix pattern at the first 
attempt. In this younger age group the performance of the short stay girls was 
initially the weakest (4 girls out of 15 making more than four errors with the ordered 
pattern) and weakened further in the face of disturbance of the matrix pattern (10 
girls out of 15 making more than four errors when the matrix was fully disturbed). 
Since the tester was male it is possible that, in spite of all efforts to establish an easy 
atmosphere, emotional factors placed these girls at a disadvantage in this initial 
activity. In the older age group errors were infrequent even when the matrix pattern 
was completely disturbed. The evidence was of the weakness of certain individuals 
rather than of any group. 


Analysis of the * Word Meaning’ activity centred upon the use of hypotheses 
(whether in a focusing or in a scanning strategy, Bruner et al., 1956). Categories of 
hypothesis A to F (Table 1) were combined to distinguish between children who had 
clearly learnt to employ hypotheses to obtain a meaning (A, B and C) and those who 
had not (D, E and F). In the first stage of the activity colour (successfully) had been 
the first choice attribute of most pupils who were using hypotheses (over twice as 
popular a choice as shape and over nine times as popular as number). The second 
stage of the activity, involving a change to a number concept, was therefore much 
more difficult and this combination of categories had the virtue of giving a ‘ strategy 
not used ’ classification to those children who depended solely on the colour attribute 
in the second stage even though instructed that the basis of word meaning was quite 
different this time. 


Success was readily achieved in both age ranges at the first (colour concept) stage. 
Differences between groups in respect of strategy employed were small (Table 2) and, 
again, weaknesses were those of individuals. 


TABLE 1 


Use or НҮРОТНЕЅЕЅ (WORD MEANING Activity) 





Strategy used A. Clear use of successive hypotheses. 
B. No initial hypotheses but subsequent clear use. 
C. Clear use at start and in conclusion but vagueness in middle attempts. 
Strategy D. Clear use of hypothesis in early attempts but subsequently no apparent use. 
not used E. Some evidence of hypothesis within an attempt but use then abandoned. 
F. No evidence of use of hypotheses; apparent random placement of cards. 
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TABLE 2 


Use or STRATEGY IN COLOUR CONCEPT STAGE OF WORD MEANING ACTIVITY 





Age 10/11 Age 12/13 

















Strategy Strategy Strategy Strategy 

Group 3 used not used used not used 
Short stay.. Girls ц 4 10 5 
Воуѕ ' 10 5 14 i 
Long stay Girls 11 4 14 1 
Boys 10 5 12 3 
Control Girls 13 ; 2 15 0 
Воув 13 : 2 12 3 





Few of the younger children, especially among the short stay immigrants, carried 
the use of hypothesis through to the number concept stage although older children 
were much more successful (Table 3). The numbers in brackets in Table 3 refer to 
the number of children who made colour alone the basis of an initial hypothesis even 
though they had been instructed that the word meaning was different this time. 


Statistical analysis was carried out by means of the GLIM computer program of 
the Oxford Numerical Algorithms Group. The program analyses a contingency table 
for a response variable (in this case use of strategy) and a number of classification 
variables (sex, age and exposure to host culture). It indicates an analysis of deviance 
for each classification variable for each possible interaction (e.g., sex/age). These 
deviances are compared with x? with the corresponding degrees of freedom. The use 


TABLE 3 


Use OF STRATEGY IN NUMBER CONCEPT STAGE OF WORD MEANING ACTIVITY 














Age 10/11 Age 12/13 

Strategy Strategy Strategy Strategy 

Group used not used used not used 
Short stay Girls 3(1) 12(6) : 9(5) 6(3) 
Boys 20) 13(7) 5(3) ' 10(8) 
Long stay Girls 7(2) 8(3) 6(4) 9(7) 
- Boys 5(3) 10(4) 8(7) 7(5) 
Control Girls 4(1) 11(4) 9(5) 6(4) 
Boys 8(6) 7(4) 11(5) 4(3) 

Total sample 29 * 61 48 42 
TABLE 3b 


COLLAPSED TABLE; Use OF STRATEGY IN RELATION TO EXPOSURE TO Host CULTURE 


Shortstay Longstay Control 


Strategy used 19 26 32 
Strategy not used 41 34 28 
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of a collapsed table is indicated where significance occurs. In respect of the number 
task the only clearly significant deviance, that for age (P « 0-01), indicates a collapsed 
total sample row reflecting the much more frequent use of strategy among older 
children. A deviance approaching significance for level of exposure to host culture 
(0-05 <Р « 0-10) indicates a table collapsed by summing over age and sex (Table 3b). 


The long stay group were particularly successful in their ability to generalise 
beyond the materials originally presented in the Word Meaning Activity (Table 4) 
although a clear weakness existed among short stay immigrants in the 10/11 age group. 
The GLIM program produced no significant differences although age and level of 
exposure deviances both approached significance. 


The pupil questioning sequences arising from the third activity were analysed in 
a search for systematic approach. At first three categories were employed, * A 
systematic hypothesis-based approach throughout’; “Хо system at any time’; and 
* Some evidence of system on some occasions’. Later the third category was elimi- 
nated by reallocating 0-5 from each entry to each of the other two categories. The 
results of the GLIM program revealed a level of exposure deviance significant at the 


TABLE 4 


ABILITY TO GENERALISE BEYOND ORIGINAL MATERIALS 








Age 10/11 Age 12/13 
Success in Short Long Control Short Long 
generalisation stay stay stay stay Control 
Complete success 7 17 18 19 19 22 
Partial success 10 7 8 6 7 1 
No success 13 6 4 5 4 7 
TABLE 5a 


Use or ENGLISH VERBAL QUESTIONING 




















Age 10/11 Age 12/13 

Group Systematic — Non-systematic Systematic © Non-systematic 
Short stay Girls 55 8:5 45 95 

Boys 6 9 105 45 
Long stay Girls 13:5 1:5 11.5. 3:5 

Воуѕ 2 13 9:5 5:5 
Control Girls 9 6 10 5 

Boys 13 2 125 25 

TABLE 5b 


COLLAPSED TABLE, Use OF QUESTIONING IN RELATION TO EXPOSURE TO Host CULTURE 





Group Systematic — Non-systematic 
Short stay 26:5 31.5 
Long stay 365 23:5 
Control ‚ 44-5 15.5 





о 
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TABLE 5c 


COLLAPSED TABLE, USE OF QUESTIONING IN RELATION TO LENGTH OF STAY AND SEX 





Group Systematic — Non-systematic 
GiRLS 
Short stay 10 18 
Long stay 25 5 
Control 19 11 
Boys 
Short stay 16-5 13-5 
Long stay 11:5 18:5 
Control 25-5 4-5 


1 рег cent level (Table 5b) аз well as а sex/level of exposure interaction significant at 
the 0-1 per cent level (Table 5c). Underlying the results was a remarkably strong 
performance by the long stay girls in the younger age group (Table 5a) which was in 
contrast to the weakness of the equivalent group of immigrant boys. 


DISCUSSION 


The Asian immigrant pupil, who on arrival may have had little experience of the 
manipulation of organised forms of concrete or visual material, nevertheless possesses 
abilities crucial to the development of relational thought. The Asian children who 
took part in this investigation were able to arrive at a recognition of relations by 
spontaneously coordinating perceptions, to order visual materials, to abstract the 
nature of a pattern and to use this abstraction when pattern and order were disturbed. 
They also employed hypotheses to associate a number concept with a new label from 
a new language, this ability developing with age and intellectual maturity as it did 
among the British children. It was only when a task demanded verbal reasoning 
through the use of questions in English that exposure to host culture proved to be a 
factor significantly affecting success. The Asian pupils were shown to be individuals 
with individual strengths and weaknesses, who are faced with a series of particularly 
difficult intellectual challenges in the course of their everyday school lives. 


Bruner (1964) has suggested that intellectual development is at first through 
action and later through images which summarise action. Through the use of 
symbolic systems experience is translated into language, which may be regarded as 
an internalised instrument of thought. The idea of manipulation of materials is 
combined with the ‘internal reorganisation’ model of learning in a developmental 
approach which has strongly influenced the English junior school curriculum. 
Plainly, for the recently arrived immigrant there should be compensatory activities 
but, if these are labelled ‘ remedial’, it becomes too easy to confuse them with the 
activities appropriate to slow-learning pupils. Plainly there should be extra ‘ English 
language ’ classes also but, if this label is too narrowly interpreted, the child may miss 
some of the experiences essential to the development of more powerful hypothetically 
orientated forms of language. If at the same time his mathematics lessons consist of 
well-meant attempts to help him catch up with instrumental techniques, attempts 
which bypass the use of concrete materials, then potential for relational thought may 
remain concealed. 


An interaction between length of stay and sex had a strong effect upon verbal 
questioning ability in the English language. The quality of questioning among Asian 
girls of both age groups who had completed a full primary school education in this 
country was high and seemingly unaffected by any reluctance to complete tasks set 
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by a male researcher or by emotional factors of the type described by Ghuman (1975). 
Girls are generally regarded as better on logical tasks involving verbal ability and as 
coming into their own in terms of verbal performance at the age of 10 or 11 when they 
are likely, anyway, to be further ahead in their physical development than their male 
counterparts. The younger Asian boys may value and be satisfied by proficiency in 
instrumental mathematics and the activities given could not have been completed by 
following a set of learned facts or rules. Asian boys had, however, retrieved their 
position in respect of logical verbal questioning in English by the age of 12 and 13. 
This confirms the conclusions of Maccoby (1967) and Witkin et al. (1962) who suggest 
that social and emotional influences predominate in adolescence causing bright girls 
to hesitate to compete with boys, who in their turn are finding new purpose in their 
studies. Sex differences stemming from an encouragement of a dependent role in 
women may be accentuated once in the case of mathematics, which many girls come 
to believe is irrelevant to their future work and life style, and again in the case of a 
culture where the male is still seen as dominant provider and the female almost 
exclusively in a nurtural role following early marriage. 


Immigrant children are coping with strong contrasts between the home and 
schoo] environments. Classmates pursue the life style of a rapidly changing culture 
while parents cling to old traditions, perhaps more closely in the new country than 
they would have done in the old. There is a clear need for further investigation into 
the ways in which this conflict between cultures affects the preparation of the immigrant 
pupil for the technological and social changes taking place in our society. 


This was a pilot investigation into a vast area of mathematical thinking. Further 
research would call for more theorising on the nature of the thought processes 
involved, especially in the case of children who are establishing a second language. 
A contrasting pragmatic approach to the development of activities which would 
enable the teacher to recognise strengths and diagnose weaknesses would be equally 
useful. Hidden within this study are a series of individual case studies of immigrant 
children who completed the activities with enthusiasm and some success who were 
later found to be regarded as of low ability in mathematics as well as in reading. In 
several cases it seemed that this new information helped the school to adapt planned 
learning sequences to the individual. In the absence of that ideal long-term research 
which would relate to many different situations as well as to the use of many different 
materials, a cumulative programme of small-scale investigations, preferably mounted 
by teachers in their own schools, seems to offer the best prospect of information 
relevant to the teacher working with immigrants. 
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ACHIEVEMENT RELATIVE TO A MEASURE OF 
GENERAL INTELLIGENCE 


By J. A. GLOSSOP, R. APPLEYARD AND C. ROBERTS 
(School of Education, University of Leeds) 


SuMMARY. Four indices of achievement relative to a measure of general intelligence 
were constructed for a sample of 178 adolescents. Measures of achievement were taken 
from tests in reading comprehension and mathematics, from an English essay and from 
the results of public examinations at 16--. Relative achievement measures were derived 
from the regression residuals of the four separate achievement scores against scores on 
the АНА test of general intelligence. Using reliability estimates for two achievement 
tests, reading comprehension and mathematics, and for the intelligence test, it was 
calculated that 69 per cent of the variance in the regression residual for reading compre- 
hension, and 56 per cent in that for mathematics, were independent of unreliabilities in 
the tests. Analysis of the relationships between the four regression residuals indicated 
a single major component on which all the residuals were highly loaded, but the low 
correlation found between essay and mathematics with intelligence controlled suggested 
the involvement of two independent skills. Correlations with an additional set of 
variables indicated that pupils from poorer social backgrounds, with poorer records of 
attendance and poorer ratings of conduct in school obtained lower scores relative to 
their intelligence test scores for all the achievement measures; boys tended to obtain 
lower scores than girls except in mathematics where there was no significant sex 
difference. 


INTRODUCTION 


WILSON (1928) and CRANE (1959) both argued that achievement and accomplishment 
quotients, which were used in early studies of under-achievement, would yield 
misleading results by over-estimating the number of under-achievers of high IQ and 
under-estimating those of low IQ in any group of children. Yule et al. (1974) found 
empirical confirmation of this prediction in a study of reading achievement in five 
large samples and showed that, in these samples at least, the effect was too large to 
be discounted. Following Thorndike (1963) they suggested that under-achievement 
might be better estimated by the discrepancy between an individual's score on an 
attainment test and the score predicted from the linear regression of the attainment 
test scores against IQ scores for the whole group studied. This is the equivalent of 
measuring attainment relative to a measure of general intelligence. 


The regression procedure gives a normal distribution of discrepancy scores 
(regression residuals) with a mean of zero provided that the scores on the IQ and 
attainment tests are normally distributed and that the two tests are linearly related 
to each other. (If these two conditions are not fulfilled, the use of a regression 
residual may introduce a bias of its own into the estimation of under-achievement.) 
Thus the proportion of those falling below the achievement level predicted for their 
IQ level, the under-achievers, will be balanced by an approximately equal proportion 
of those exceeding that predicted level: these have been referred to as over-achievers. 
Such an effect was observed by Yule ez al. (1974). Using linearly related tests with 
normal distributions of scores, they found a nearly normal distribution of residuals 
in their samples, centred around a mean close to zero. They concluded that under- 
achievement/over-achievement would characterise normal children, and they hypo- 
thesised that variation in under-achievement/over-achievement could be related to a 
range of genetic and environmental influences. A slight, but statistically significant, 
over-population in the extreme under-achievement tail of the distributions in three of 
the five samples was taken as evidence for the existence of a small proportion of 
children with pathological reading deficiencies. 


249 


250 Achievement and Intelligence 


The operationalisation of under-achievement as a regression residual, the 
observation that normal children show variation along this dimension that extends 
symmetrically into a positive sector of over-achievement, and the hypothesis that 
such variation might coincide with both genetic and environmental conditions, raise 
two main questions. The first question is fundamental. The residuals are based on 
a difference between two measures and it is necessary to investigate to what extent any 
discrepancies which are being interpreted as under-achievement/over-achievement 
reflect merely the unreliabilities of the tests which are being used. If it can be shown 
that the residual measure contains a substantial component which cannot be attributed 
to the unreliabilities of the tests, the second question may be asked: to what extent 
does this component go beyond transient levels of performance to represent aspects 
of achievement that persist over time and across different academic areas, and that 
relate also to non-academic variables. 


The researches reported in this paper were directed to these particular questions. 
There were three principal concerns: (1) to investigate the effects of test reliability on 
two sets of regression residuals, one based on a test of reading comprehension, the 
other based on a test in mathematics; (ii) to investigate the relationships between 
four different sets of residuals—those referred to which were based on reading 
comprehension and mathematics, a third based on an English essay, and a fourth 
based on achievement in examinations at 16+; and (iii) to investigate the relationships 
of these four residuals with two background variables—sex and social background, 
and with two behavioural variables—attendance at school and ratings by teachers of 
conduct in school. 

METHOD 
The sample 

The sample comprised 178 pupils aged 15-16 from the fifth form of a long 
established comprehensive school taking pupils from the age of 11 through to 18; 
this sample constituted 90 per cent of the total cohort of 197 pupils. The school was 
set in a semi-rural locality on the outskirts of a large urban population centre where 
the pattern of social background appeared slightly less favoured than average. For 
example, the mean social class on a four-point scale (from professional-managerial to 
unskilled manual) was 3-0 as compared with a national mean of 2-7 (HMSO, 1976); 
the mean score on the AH4 test of general intelligence was 64-9 as compared with a 
norm for the corresponding age group of 69-6 (estimated from data in Heim, 1970). 
On one available criterion of performance in examinations at 16+, however, achieve- 
ment in the school was above average: 63 per cent of the pupils in the enquiry cohort 
gained one or more GCE O-level or O-level equivalent pass as compared with a 
national mean for the same year of 56:2 per cent (Wright, 1977). 


Tests 

General intelligence, reading comprehension and mathematics: The first object of 
the investigation was to check the effects of test reliabilities on two regression residuals. 
To calculate these residuals with respect for the appropriate assumptions it was 
necessary (а) to use tests which in earlier administrations had yielded normally 
distributed patterns of scores and which were likely to be linearly related to each 
other; and (b) to use tests for which reliability coefficients were available. Three 
tests satisfied these conditions: the AH4 test of general intelligence (Heim, 1970), 
the Manchester Reading Comprehension Test (1967) and the Graded Arithmetic- 
Mathematics Test (Vernon, 1970). АЛ аге timed tests requiring the solution of short, 
closely defined problems while in general appearance and design format they are as 
much alike as is compatible with the different nature of the tasks set. It was hoped 
that this similarity would limit the differential effects of motivation and attitudes in 
the face of different types of task. 
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Distributions of scores in the sample for these tests were satisfactorily near 
normal. Relationships between the attainment tests and the АНА test were checked 
for linearity by one-way analysis of variance: for these analyses each variable in 
turn was categorised into deciles while its partner for the analysis retained its original 
distribution. In this way linearity was tested in each direction for each pair of 
variables; AH4- mathematics; AH4- reading comprehension. In each case the 
generalised correlation coefficient (9) was within 0-04 of the linear correlation 
coefficient (r). 


The English essay: Grades were awarded on a seven-point scale for a single essay 
completed under test conditions. Two essay topics were offered as alternatives: A 
Holiday Adventure or I Shall Never Do That Again. The essays were divided into 
four groups and marked once by one of four markers: all scripts were then checked 
by a fifth marker and any substantial discrepancies in mark resolved after further 
inspection of the scripts in question. The resulting distribution of grades was 
approximately a grouped normal distribution and the grades on average were not 
significantly affected by choice of topic or marker. The relationship between this 
variable and АНА score was tested for linearity as above except that the essays were 
already grouped into seven categories. In this case the generalised correlation coeff- 
cient (1) was similarly close—within 0-03—to the linear correlation coefficient (ғ). 


Examination performance at 16+: This measure was the sum of GCE O-level 
and CSE passes obtained by each pupil, weighted for the grade of pass and examination. 
For each examination subject the lowest O-level pass was given a score of 5, the same 
as the highest CSE pass grade: the top O-level grade was given a score of 9 and the 
lowest CSE pass grade a score of 1. Intermediate grades were placed in between 
these, and those pupils obtaining no pass in a subject were given a score of 0 for that 
subject. (In the sample used no pupil was entered for both O-level and CSE examina- 
tions in the same subject.) The total score for each pupil was obtained by adding 
together the scores for all the subjects he had entered. The scores were then arranged 
in nine groups containing approximately equal numbers of pupils: this was to 
counteract an element of skewness in the distribution of the raw scores. The relation- 
ship between this variable and the AH4 measure showed negligible deviation from 
linearity. In a one way analysis of variance the generalised correlation coefficient (7) 
was shown to be within 0-02 of the linear correlation coefficient (r). 


Relational variables 

Social background. Social background was measured by a factor score based on 
the first principal component (without rotation) derived from data supplied by 
subjects on a questionnaire form: this included information on father's occupation, 
family size, type and ownership of home, and books and newspapers in the home. 
(Questionnaire and details of analysis are available from the authors.) The factor 
Scores were grouped in vigintiles for the calculation of correlation coefficients involving 
this variable. 


Absence. Absence was recorded as the percentage of half-days absent in the 
fourth year at school (age 14-15) for the cohort studied. Because this variable was 
highly skewed——heavy rates of absence appeared in very tiny frequencies—the per- 
centages were converted to logarithms for the calculation of correlation coefficients. 


Conduct ratings. Conduct ratings were based on a five-point scale: very good, 
tolerably good, difficult, awkward and very bad. They were supplied by the senior 
mistress of the school who collated reports on the classroom behaviour of the pupils 
from individual teachers. The distribution of this variable was not normal as a high 
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proportion of pupils were given ratings of very good. There appeared, however, no 
suitable way of modifying the original distribution. 


Procedure 

The essay was written and the three tests completed on two consecutive days at 
the beginning of the first term of the pupils’ fifth year; the 16-- examinations were 
undertaken during the third term of that year. The achievement data were regressed 
against AH4 scores and regression residuals calculated to represent achievement 
relative to the measure of general intelligence (under-achievement/over-achievement). 
The effects of test reliability were considered for the residuals based on reading 
comprehension and mathematics, and an analysis of all the interrelationships formed 
with the four residuals was then completed. 


RESULTS 


The effects of test reliabilities on two regression residuals, those based on reading 
comprehension and mathematics, were checked by testing the extent to which there 
was a component of under-achievement/over-achievement in each case that was 
genuinely independent of the unreliabilities of the tests. This was done in two stages. 


Differences between the AH4 measure and the attainment tests 

The first stage involved a check on the nature of the relationships between the 
AHA scores and the scores on the two attainment tests to estimate how far each test 
was measuring a different skill—for otherwise there would be no difference whatever 
which could be interpreted as an under-achievement/over-achievement variable. 
This check was performed by correcting the observed correlations to compensate for 
the unreliabilities of the tests, by calculating what are known as the ‘ true ' correlations: 
these are the correlations that would have occurred between the tests had they been 
perfectly reliable. The proposition is that where the correlation between two perfectly 
reliable tests is manifestly below unity then it may be concluded with confidence that 
they are really measuring different things. 

The procedure used was based on the Spearman formula for the correction of 
attenuation (Lindquist, 1940, p. 233): 

ry2(measured) 


712 (true) = 
(true) Arii X 722 


where ғ and гоз are the reliability coefficients for the two tests. The reliability 
coefficients used were those supplied in the test handbooks and had not been obtained 
in conditions that were ideal for the present purpose. For the reading comprehension 
and АН4 tests the test-retest interval—of the order of one month—was much longer 
than the interval between the separate test administrations in this investigation, 
while the coefficient for the mathematics test was a split-half for a sample of 100 taken 
from three highly selected classes of children—the figure supplied is the average 
reliability for the three classes. In these conditions the reliability coefficients for all 
three tests are probably lower than would have been obtained from a full test-retest 
of the present comprehensive sample within the two-day interval. This means that 
the figures presented here are probably conservative: that is, the likely discrepancies 
between the reliability coefficients used and the best estimates will inflate the estimated 
contribution of unreliability to the variance of the regression residuals. 

On the basis of the figures in Table 1 and using the Spearman formula, ‘ true’ 
correlations were estimated for the three tests: 


АН4 with reading comprehension 0-877 
АНА with mathematics 0-896 
mathematics with reading comprehension 0-840 
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TABLE 1 


CORRELATION MATRIX AND RELIABILITY COEFFICIENTS FOR THE AH4 TEST AND ATTAINMENT TESTS IN 
READING COMPREHENSION AND MATHEMATICS 





Tests AH4 Reading comp. Maths 
AH4 (0:919) 0-815 0-805 
Reading comprehension 0-815 (0:939) 0-763 
Mathematics 0-805 0-763 (0-878) 





Nore: The values shown in brackets on the diagonal of the matrix are the reliability coefficients for 
the tests which are reported as norms by the distributors of the tests. 


These correlations are still less than unity and it may appear, therefore, that the 
regression residuals based on reading comprehension and mathematics are not simply 
artefacts of the unreliabilities of the tests but that the separate tests do measure 
different things. The argument is strengthened by the likely overestimate of un- 
reliability in the coefficients actually used for the calculations. 


The contribution of unreliability to the variance in the residuals 

The proportion of variance in the scores on an attainment test (test 2) which can 
be predicted from the scores on the АНА test (test 1) by means of linear regression is 
equal to the square of the correlation (r12) between the tests (Hays and Winkler, 1971, 
p. 621). The proportion of variance of test 2 which cannot be predicted from test 1 
181--ғ122. This, then, is the proportion of the variance of test 2 which is taken into 
the residual and the variance of this residual can be represented in the following 
formula: 

1.е., O? residual) = Or test 2) Х (1— 7122 (measured) 


The variance of the residual if both tests were perfectly reliable would then be 


G2 reliable residual) — Fest 2)X (i-r 122 true) 


and the proportion of the variance of the residual measure which results from the 
unreliability of the rests is 


{= 02, аме residual) р l—r 12? rue) 
02 (residual) l-r 12" (measured) 


This formula yields estimates of 31 per cent and 44 per cent for the random 
components in the residuals based on reading comprehension and mathematics 
respectively. The remaining proportions of variance, 69 per cent and 56 per cent, 
are the components estimated as being due to the genuine difference in the abilities 
measured by the tests. They represent the extent to which there is a genuine variable 
such as under-achievement/over-achievement in these two areas, an achievement 
which is genuinely independent of the measure of general intelligence. 


Relationships between the four residual measures 

Because the residuals are all based on results from a single administration of the 
AHA test of general intelligence rather than on separate administrations, they are not 
strictly independent: correlations between them may have been influenced by some 
common element of performance in the single test. It seems unlikely that any two 
residuals representing achievement in different areas which were based on separate 
administrations would ever be negatively correlated, although they might have a 
correlation of zero. In this case, therefore, the probable limit for the association that 
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could be introduced by the common test may be indicated by the lowest observed 
correlation: here that is the correlation of 0-068 between the residuals for mathe- 
matics and essay. The inflation of a possible zero correlation to one of 0:068 through 
the use of non-independent residuals is clearly not serious, and the corresponding 
corrections to the coefficients in Table 2 are so small as not to be worth making. 


Under present conditions the coefficients in Table 2 are the same as partial 
correlations between the individual tests and the examination results after control for 
variation in AH4 scores (Bohrnstedt, 1969, р. 118). By the standards of partial 
correlation they are high, and they are statistically significant except for that between 
the residuals based on essay and mathematics. This low correlation indicates that 


TABLE 2 


CORRELATIONS BETWEEN FOUR RESIDUAL MEASURES 








O-level and 
Tests Reading comp. Essay Maths CSE 
Reading comprehension — 0:395 0-310 0-452 
Essay 0:395 — 0-068 0-411 
Mathematics 0:310 0-068 — 0-415 
O-level and CSE 0:452 0-411 0-415 — 


All the coefficients are statistically significant (P « 0-01) except for that between essay 
and mathematics. 


TABLE 3 


LOADINGS OF THE Four RESIDUAL MEASURES ON Two FACTORS: 
PRINCIPAL COMPONENT SOLUTION 





Tests Component 1 Component 2 
Reading comprehension 0-77 — 0:10 
Essay 0-65 - 0:63 
Maths 0-59 0-72 
O-level and CSE 0-82 0:07 


the specific kinds of skill required for the separate tasks are independent of each other. 
Perhaps the most economical demonstration of this proposition is by a principal 
component analysis (without rotation) of the whole set of correlations (Table 3): 
this gives a very clear display of the structure of relationships between the variables 
(Hope, 1968: p. 42). 

In this analysis there appears a first component of general achievement on which 
all measures load heavily in the same direction, and a second component displaying 
the opposition between the mathematics and essay residuals with the other measures 
located in the middle, between these two. 


Relationships between the residual measures and other variables 

The correlations in Table 4 are those between specific components of attainment 
independent of the measure of general intelligence and the selected non-academic 
variables. They represent part correlations (Bohrnstedt, 1969): these will usually 
have values numerically similar to partial correlations. 


The results show that boys obtained significantly lower scores relative to their 


АНА scores than girls except in mathematics where the relationship with sex was very 
small. АП the other correlations were significant, indicating that pupils from poorer 
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TABLE 4 


CORRELATIONS BETWEEN Four RESIDUAL MEASURES AND OTHER VARIABLES 








O-level and 
Variable Reading comp. Essay Maths CSE 
Sex — 0:155* —0:306** --0014 —0:329** 
Social background 0-140* 0-202** 0-206%% 0:272** 
Absence 0.253** 0:336** 0:143* 0:507** 
Behaviour rating 0:250%% 0-222** 0:267** 0-412** 





*P<0-05 ** P<0-01 


Positive relationships with social background, absence and behaviour indicate that 
low scores on the attainment tests, essay and examination performance (measured relative 
to АНА scores) are associated with poor ratings on these variables: the negative relationships 
with sex indicate a tendency for boys to perform worse than girls. 


social backgrounds, those with poorer records of attendance at school and those with 
poorer ratings of conduct in school all performed worse on each of the relative 
achievement measures. The higher correlations tended to be with the behavioural 
variables. 

DISCUSSION 


Residuals based on the regression of achievement tests against intelligence 
measures will necessarily exhibit a normal distribution so long as the tests themselves 
are linearly related and normally distributed. Among normal children, therefore, 
variation in these achievement tests will occur throughout the range of measured 
intelligence; correlations with the residuals will be the equivalent of partial or part 
correlations with the initial achievement measure after control for measured intelli- 
gence. For the sample studied in this enquiry it has been shown further: (1) that the 
regression residuals for reading comprehension and mathematics contain a substantial 
component which is not due simply to the unreliabilities in the constituent tests; 
Gi) that there are significant associations between residuals based on different tests 
and academic areas, and between tests and examinations separated by a period of 
some eight months (Table 2); and (iii) that there are significant associations between 
all the residuals and the variables of social background, attendance and conduct 
rating in school (Table 4). The results point to a degree of stability in the indices 
based on residuals and suggest that their relevance to the analysis of achievement may 
extend beyond the particular test occasion. The hypotheses presented in Yule et al. 
(1974), namely that under-achievement/over-achievement;characterises normal children 
and may correlate with a range of environmental influences, is thus corroborated. 


Given this conclusion it may be appropriate to reconsider the implications of the 
term under-achievement/over-achievement. In Yule ez al. (1974) the term under- 
achievement is used in two senses. (a) It refers to the pathological under-achievement 
of a very small set of children: this phenomenon is indicated by the over-population 
of the distributiona] tail in an otherwise normal distribution of regression residuals. 
(b) It represents the negative aspect of a single, normally distributed dimension— 
under-achievement/over-achievement. Tt is with the implications of sense (b) that the 
present report has been concerned. This, however, highlights a well known conceptual 
difficulty for it embodies the notion of a balance between under-achievers and over- 
achievers which is the natural outcome of the recommended regression procedure. 
Such a balance is in conflict with the assumption of a level of potential achievement 
which has been associated with the notion of under-achievement: subjects are often 
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defined as under-achievers because they failed to reach their potential. But it is not 
clear in what sense a level of achievement which is often exceeded—just as often, in 
fact, as it is missed—can be described as a level of potential. The concept of potential 
seems realistic in an empirical context only when the majority of children of any 
particular level of general intelligence reach or are very near to that potential, with 
some children below this mode and only very few (if any) above it. Since the use of 
regression residuals with normally distributed and linearly related data implies nearly 
symmetrical distributions of residuals, the concept of potential achievement does not 
seem applicable to these kinds of data. If potential achievement is insecure, the 
associated idea of under-achievement in this context is also problematic: if there is 
doubt about the standard, failure to reach that standard cannot be defined. 


The conclusion reached here, therefore, is that where interest lies in the normal 
range of scores given by regression residuals for achievement against intelligence, the 
concept of potential achievement might be better replaced by a concept of normal 
achievement, the mean level attained by a group of particular general intelligence. 
The associated differential measure would then be relative achievement, achievement 
in specific tasks measured relative to an expectation based on the achievement of peers 
in age and general intelligence. The concept of under-achievement would be reserved 
for groups, such as those identified in Yule et al. (1974), which occupy an extreme and 
unusual position in the structure, groups with supposedly pathological deficiencies. 


Enquiring further into the characteristics of this variable of relative achievement, 
an analysis was completed of the interrelationships between the four residual measures 
(Tables 2 and 3). This indicated two components of relative achievement: a general 
component on which all four measures were highly loaded, and a component opposing 
essay and mathematics. With the variables so limited to four it is not possible to infer 
the exact character of the second component. However, it may be hypothesised that 
two kinds of skills are demonstrated: a skill involved in the planning and organisation 
of solutions to open-ended tasks; and a skill in solving more closely defined problems 
that are set within a given structure. The virtually complete independence of essay 
and mathematics attainment once measured intelligence has been controlled suggests 
that these two kinds of skill are separable, although the principal component analysis 
indicates that reading comprehension and performance in the 16+ examinations 
involve skills of both kinds. 


Analysis of the relationships between the four residual measures and the 
behavioural and background variables seemed to indicate a syndrome in which 
attainment, conduct, absence and social background are all connected independently 
of measured intelligence (which does not, of course, imply the exclusion of measured 
intelligence from this syndrome). The close association within this syndrome of 
relative achievement with the behavioural variables might support the hypothesis that 
attainment in school does involve a motivational component that operates to some 
extent independently of measured intelligence, and that the major component in 
relative achievement for norma! children can perhaps be considered as a behavioural 
variable in its own right. 


The suggestions made here about the character of relative achievement variables, 
and about the structure of their relationships, are clearly tentative and require further 
investigation. In particular, it would be useful to extend the range of tests, both of 
attainment and general intelligence, and to use a longer time span with larger samples 
in order to study patterns of change. Nevertheless, it does seem clear from the 
present results that relative achievement can be measured on scales which possess 
reasonable stability and which are significantly related to measurable aspects of 
background and behaviour: these may inform future studies of the character and 
causes of differential attainment in schools. 
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ASSESSING BEHAVIOURAL ADJUSTMENT TO SCHOOL 


By M. B. YOUNGMAN 
(School of Education, University of Nottingham) 


Summary. The development of a 34-item self-report measure of behavioural adjustment 
to school is described in detail. Analysis identified three distinct scales—studiousness, 
compliance and teacher contact. АП three scales and the total score are shown to have 
acceptable reliability and well-defined construct validity within a normal lower secondary 
School population. "Various applications of the instrument, including the use of a 
teacher form, are also discussed. 


INTRODUCTION 


THE school environment has little in common with the pattern of experience a pupil 
encounters in the world outside. Features such as the crowding together of so many 
children and adults, long periods of continuous concentration or the need to plan 
days ahead all combine to make unusual demands on the child's adaptability. Yet 
many of these environmental components are implicit and essential parts of the 
educational process. To benefit fully from schooling the child must come to terms 
with most of them. 


The concept of school adjustment has no generally accepted meaning so measures 
of adjustment to school often relate to particular stages or types of schooling. For 
example, Thompson (1975) examined adjustment at the infant stage, Youngman and 
Lunzer (1977) devised scales measuring adjustment around the period of transfer 
from primary to secondary school. A brief outline of these measures can be obtained 
from Youngman (1978). Cohen (1976) offers numerous other examples. When a 
more general assessment has been required the tendency has been to use a measure 
of maladjustment such as Stott's (1974) or Rutter's (1967). However, this type of 
scale rarely shows adequate discrimination within the normal school population. 
This article describes the construction and validation of a behavioural measure of 
adjustment to school that is applicable to most children between the ages of 
approximately 9 to 13 years. 

METHOD 


Item selection and analysis 

The principles determining the selection of individual items are explained in 
detail in Youngman (1969). The main requirement was that items should describe 
typical school behaviour, which in turn helped meet the further need to be as objective 
as possible. This is similar to Stott's prerequisite that a behaviour “ description must 
be as independent as possible of the rater to the subject " (Stott et al., 1975, p. 6). 
Dimensions considered particularly important were those of obedience, compliance 
and organisation, The first two are very close to the concept of docility as presented 
by Punch and Rennie (1978). An inventory of 40 such items was constructed and 
administered to two secondary school samples (Bennett and Youngman, 1973). The 
first sample comprised 274 12-year-old children, the second 288 13-year-olds. Oblique 
factor analysis (Kaiser and Rice, 1974; Youngman, 1976) of the first sample generated 
three interpretable factors suitable for scale construction. Subsequent item analysis 
of these three subscales scored on the second cross-validation sample resulted in 34 
items being accepted when a point-biserial item-total correlation cut-off of 0-3 was 
applied. Table 1 gives the scale sizes and reliabilities as estimated by Cronbach's 
alpha. Items and scale allocations are reproduced as an appendix to this article. 


The reliability values are good for the two largest subscales and for the total 
score. Scale names reflect the character of constituent items. So, for example, the 
studiousness scale contains items like ‘Can you keep on working for a long time?’ 
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TABLE 1 


CROSS-VALIDATION RELIABILITIES FOR THE THREE 
SUBSCALES AND TOTAL (N = 288) 





Scale No. of items alpha 
Studiousness 12 0:77 
Compliance 15 0:77 
Teacher contact 7 0-60 

Total 34 0-86 





(YES) апа“ Do you often talk to the person next to you?’ (NO). Compliance includes 
* 15 your work usually neat? ’ (YES) and * Have you been punished by a teacher quite 
often? ’ (NO) and as such inclines towards compliance with regulations. The teacher 
contact scale brings together all the items on relationships with teachers. As a scale 
it is rather small (hence its lower reliability) and whilst it does have a clear and 
distinct character, its main contribution is towards the total adjustment score. 


Having completed the item analysis and reliability checks on these two samples, 
the rescored three-scale version was applied to a third sample of 348 second-year 
secondary school children to obtain basic statistics and so that cross-validation 
analyses could be performed. 


. Characteristics of the scales : 

; Table 2 gives basic statistics for the three scales and the total score, for all three 
samples. Compliance and teacher contact show a slight tendency to curtailment but 
as the distributions in Figure 1 show, all the measures apart from teacher contact 
record good discrimination. Since the original item selection employed oblique 


TABLE 2 


SCALE STATISTICS FOR THREE SECONDARY SCHOOL SAMPLES 




















Sample 1 Sample 2 Sample 3 
Lancashire Lancashire Nottinghamshire 
12-year-old 13-year-old 123-year-old’ 
N = 274 N = 288 N = 348 
Mean SD Mean SD Mean SD 
Studiousness 5:52 3:06 4-72 2:98 5:54 2:82 
Compliance 11:16 3:14 10:11 3-24 11:23 2-88 
Teacher contact 4:88 1:66 4:80 . 1:67 5-07 1.41 
Total 21:57 6:18 19:63 6:51 21:84 5:67 
TABLE 3 


Рворост MOMENT CORRELATIONS FOR THE SCALES AND TOTAL (N = 348) 


Studiousness Compliance Teacher contact 





Studiousness — -- — 
Compliance 0:574 — — 
Teacher Contact 0:312 0:246 — 





Total 0-808 0-856 0:530 
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FIGURE 1 
DISTRIBUTIONS OF THE THREE SUBSCALES AND THE TOTAL Score (М = 348) 
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factor analysis the three scales are likely to be correlated; Table 3 gives product 
moment correlations for the third sample. 


Studiousness and compliance are quite highly inter-correlated and therefore they 
also have high correlations with the total score. Teacher contact is again seen to be 
different from the other scales, having only low correlations with them. Nevertheless 
it does correlate moderately highly with the total score. 


In addition to having overall statistics available, it may be useful to know of any 
differences in score patterns recorded by various subgroups. Clearly age is likely to 
affect score levels but there is not yet sufficient information at hand to provide 
reliable figures for younger age groups. However, boys and girls have been analysed 
separately and as Table 4 shows, there are significant differences on three of the four 
associated measures. In all instances the girls score higher. 


TABLE 4 


COMPARISON OF Boys AND GIRLS ON SCALES AND TOTAL Score (N = 348) 














Boys (N = 176) Girls (N = 172) 
Mean SD Mean SD t-value 
Studiousness 5:12 2:80 5:96 2.78 2.78 *+* 
Compliance 10-27 2:88 1221 2:53 6:63** 
Teacher contact 5:04 1:40 5.10 1:42 942 
Total 20-44 5:69 23.27 5-26 481** 
** р< 0-01 


Validation 

Although acceptable reliability estimates and other statistical data have been 
presented for the three scales and the total score, it is still necessary to demonstrate 
the validity of these scales as measures of school adjustment. In particular, is it 
legitimate to interpret the total score as a general measure of behavioural adjustment 
to school? With no equivalent measure available, validity is assessed in terms of 
construct validity. 


The precise nature of the constructs identified by the inventory can be determined 
by comparing children’s scores on the four behaviour measures with their scores on 
other related measures. Table 5 supplies comparative data for two samples. Firstly 
there are the correlations between the behaviour measures and the three Junior 
Eysenck Personality Inventory scales, all scored on sample 2(N = 288). Studiousness 
and compliance manifest similar patterns, recording near-zero correlations with 
extraversion. The moderate negative correlations with neuroticism imply a positive 
relationship with stability, as one would hope. Interestingly correlations with the lie 
scale are even higher, indicating a strong element of conformity in these two behaviour 
scales. The teacher contact measure records a positive correlation with extraversion, 
but lower correlations than the other two scales with stability and the lie scale. This 
helps explain the difference between studiousness and compliance on one hand and 
teacher contact on the other since the latter clearly delineates the outgoing and co- 
operative aspect of classroom behaviour. The total score correlations are similar to 
those for studiousness and compliance for this analysis. 


A second more detailed comparison of the behaviour measures is available via 
the third sample comprising 348 secondary school children. This sample featured in 


a study of primary/secondary transfer (Youngman and Lunzer, 1977) and full | wee 


D 
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TABLE 5 


VALIDATION CORRELATIONS BETWEEN THE BEHAVIOUR MEASURES AND SELECTED PERSONALITY AND 
ACHIEVEMENT MEASURES 








Behaviour measures 
Studious- Teacher 
Validation measures ness Compliance ^ contact Total 
Sample 2: N = 288 

1 Extraversion --08 —04 26** 01 

2 Neuroticism -37% —33** -13% —36** 

3 Lie scale 50** 38** 22** 48% 

Sample 3: N = 348 

4 Junior attitude to secondary 00 00 02 01 

5 Junior attitude to primary 21** 38** 16** 34** 

6 Junior apprehension over transfer 06 08 —08 05 

7 Junior self-concept: social -03 11% 05 05 

8 Junior self-concept: personal 08 24** 11% 19%% 

9 Junior self-concept: academic 21** 27** 16** 28** 
10 Junior academic motivation 28** 41** 18** 39** 
11 Secondary attitude to secondary 31** 26** 20** 33** 
12 Secondary attitude to primary —11** —06 02 — 08 
13 Secondary anxiety 00 00 —06 --02 
14 Secondary self-concept: social 00 02 14** 05 
15 Secondary self-concept: personal 08 15** 09 14** 
16 Secondary self-concept: academic 29%% 27** 21** 33** 
17 Secondary academic motivation 39** 46** 23** 48** 
18 Junior non-verbal reasoning 05 21** 13* 16** 
19 Junior reading comprehension 06 32** 07 2i** 
20 Junior mathematics 06 26** 12* 19** 
21 Secondary reading comprehension 03 29** 04 17** 
22 Secondary mathematics 02 30** 07 18** 





Norzs: Decimal points omitted. 
Significance indicated ** P «0-01, * P «0-05. 


information on the measures applied can be obtained from that source. Briefly the 
data. assess attitude, personality and achievement before and after transfer from the 
junior school to the comprehensive. The variable names shown in Table 5 give some 
indication of the nature of each comparative measure. The blocking separates junior 
attitude and personality, secondary attitude and personality, and achievement. 


Table 5 enables several separate validation assessments to be made. Firstly it is 
possible to answer the earlier question as to how far the total score provides a general 
measure of adjustment to school. Variables 5 (junior attitude to primary school) and 
11 (secondary attitude to secondary school) both measure reactions to the present 
school. The moderate correlations of 0-33 and 0-34 respectively between these 
variables and the total score indicate some similarity of construct. Additional 
confirmation of this relationship arises from the near zero correlations between the 
total score and the two variables measuring attitude to another school, variables 4 and 
12. Also the total score correlates even more strongly with the two academic moti- 
vation measures, variables 10 and 17. The achievement measures complete the 
picture since here the correlations are lower but again significant and positive. 
Finally, although validation is seen predominantly in terms of the presence of 
relationships, it is also important to remember that the absence of a relationship may 
help define a construct. In that the total score records zero correlations with the 
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personal attributes of anxiety (variable 13), apprehension (variable 6) and extraversion 
one can be surer that the total score does define adjustment to school, and not a less 
specific construct such as general personal adjustment. Together with the reliability 
estimate of 0-80 these findings do support using the total score as an indication of 
behavioural adjustment to school. 


Finer discrimination within this construct can be obtained from the three scale 
scores. Although the structural difference between the studiousness and compliance 
scales has not been apparent up to this point, Table 5 does finally clarify the issue. 
Specifically the achievement correlations (variables 18 to 22) indicate that studiousness 
is independent of academic performance whilst compliance is consistently positively 
correlated with all the performance measures. The only other difference between the 
two scales, showing compliance to be more strongly associated with personal self- 
concept, is relatively small. The teacher contact scale registers correlations similar 
to, but weaker than, those for studiousness and compliance. However, apart from 
the observation already made regarding its relationship with extraversion, there is no 
evidence to suggest that the teacher contact construct operates in a different manner 
from the other two. 


Applications 

The validity assessment above supports using the total inventory score as a 
general measure of behavioural adjustment to school. Alternatively where more 
detailed discrimination is required, particularly if intellectual and non-intellectual 
adjustment are to be identified, the three subscales can be scored separately. A 
further refinement involves producing a teacher form so that the teacher can assess 
the children’s adjustment rather than rely upon self-report. This is likely to be a 
more appropriate procedure with younger children, or with poor readers. If this is 
done it is essential that the teacher should assess every child on an item before 
proceeding to the next item, otherwise there is the danger of prejudging individual 
children. 


Two earlier applications of the scale have already been mentioned. In one 
(Bennett and Youngman, 1973) combining the 40 original inventory items with the 
60 Junior Eysenck Personality Inventory items enabled four syndromes of school 
adjustment to be identified. Similarly Youngman and Lunzer (1977) also demon- 
strated the ability of the total adjustment score to distinguish between groups of 
children adjusting well or poorly to school transfer. Currently the instrument is 
being used to investigate school adjustment in relation to home background, to assist 
special school placement at junior school level, and to evaluate the effects of special 
school provision. 
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APPENDIX: Behaviour-in-School Inventory with scoring 


Name 


School 


When you have finished, check to make sure you haven't missed any. 


ма Шә bO RA шә f MO OO M ROBO PO f Umm МӘ DO DO N PO Re W Re DO DO Мә f €) noQ) [D Po 


Do you often look out of the classroom window? 
Have you had things taken from you by the teacher? 
Is your work usually neat? 

Do you nearly always answer if teacher asks youa question? 
Do you often talk to the person next to you in class? 
Do you sometimes run errands for the teacher? 

Do you find it difficult to sit still for a long time? 

Is your writing easy to read? . 3 Я 

Do your books get scruffy quickly? x 

Are you often late for your lessons? 

Are you usually quiet in class? 


Do you nearly always put your hand up if a teacher asks a [queo o. 


Do you sometimes daydream? $ 

Have you nearly always got a pen or biro with you? 
Have you been punished by a teacher quite often? 

Do you always do your homework? 

Have you been in any fights in school? 

Have you often dropped or spilled things in class? 

Do you walk quietly about the school? 

When the teacher is talking, do you always pay attention? 
Do you ever ask the teacher questions? 2 

Can you keep on working for a long time? 


Do you usually have all the books and other Бари you ies’ for lessons? р 


Do you sometimes leave work unfinished ? 

Do you mostly work on your own?. 

Do you ever push other boys or girls about? 

И you can't do the work, do you ask the teacher for help? 
Do you often ask to leave the room? 2 
Do you always do as you аге told without complaining? 
Do you answer back of a teacher tells you off ? 

Do you sometimes start laughing or giggling in class? 

Do you sometimes shout out answers before you are asked? 
Do you always ask for help if you get stuck with your work? 
Do you always ask the teacher before you leave your place? 


18 1 = STUDIOUSNESS 
SCALE ALLOCATION 2 = COMPLIANCE 


3 = TEACHER CONTACT 


Read them very carefully. 


SON сә 


SCORING DIRECTION 
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HAS PIAGET’S CONSTRUCT OF FORMAL OPERATIONAL 
THINKING ANY UTILITY? 


By M. SHAYER 
(Chelsea College, London) 


Summary. The relation between Piaget’s logical theory of formal operational thinking 
and his account of cognitive development is discussed. The consistency of the latter is 
tested by a factor-analytical study of the performance of 481 14-year-olds on five formal 
tasks. The content and concurrent validity of the group tests developed is reported, 
together with test-retest and internal consistency reliabilities. Formal operational 
thinking was found to be a unitary construct, and both heterogeneity of performance 
and décalage were found to be smaller than a recent review had suggested. Yet it was 
concluded that a theory possessing predictive power is still elusive. 


INTRODUCTION 


IT is some time since any public defence of Piaget’s work on adolescents has been 
published, almost certainly because further experimental evidence was needed. 
Following the publication of The Growth of Logical Thinking (Inhelder and Piaget, 
1958) four steps would have been needed before it could become established as part 
of the apparatus of developmental psychology. The first step, that of replication, 
was already completed and published by Lovell in 1961. The second, that of critical 
testing of each of the Inhelder problems for consistency of behavioural description 
and stage categorisation, has been done only for Pendulum (Somerville, 1974) and 
Chemical Combinations (Dale, 1970). The third step, testing the Piagetian account 
of formal operational thinking by giving a wide range of problems to the same subjects 
and looking at their ability to characterise the level of a subject's competence, could 
not be done properly without the second step.  Factor-analytical studies done 
without the second step, such as those of Lovell and Shields (1967) and Bart (1971), 
were supportive but not conclusive. The fourth step, that of producing develop- 
mental norms, was done neither at Geneva nor elsewhere. In this paper some aspects 
of the work required are reported, and then related to the methodological problems 
which await resolution. 


The Piagetian account of formal operational thinking has three ‘tiers’. Тһе 
lowest is that of the behavioural descriptions cited in the Inhelder problems. The 
next tier depends upon the different levels of behaviour reported by Inhelder and her 
students on each of the problems. On these Piaget imposes a classificatory ordering, ex- 
pressed developmentally as the progress from pre-operational through concrete to for- 
mal operational thinking. The third tier is that of a structuralist meta-theory, couched 
in terms of symbolic logic. There is no doubt that this theory was a very fruitful 
heuristic for Piaget and his co-workers, but its prominence in The Growth of Logical 
Thinking has given rise to much confusion. Both critics and proponents of Piaget's 
account of adolescent development have taken the third tier as though it can be used 
as a deductive theory. Thus Raven (1973, 1975) used a series of problems derived 
from the theory to create what was purported to be a test of operational thinking. 
A similar procedure was used initially to create what was to be the operational 
thinking sub-scale of the new British Intelligence Scale (Warburton, 1966; NFER, 
1978). Parsons (1960), in an early review, showed that Piaget's logical symbolism 
was unorthodox and argued that his scientific work was therefore unsound, Ennis 
(1975) developed Parsons’ argument by suggesting that, since the meta-theory cannot 
be used to deduce either of the lower two tiers, the lower two tiers of the Inhelder/ 
Piaget account must be unsound and useless. 
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There is thus a considerable problem within the philosophy-of-science of how to 
approach the validity of Piaget’s work. As this point has already been made elsewhere 
(Shayer, 1978a), here the work of Wason and Johnson-Laird (1972) will merely be 
adduced as experimental evidence that the development of adolescent thinking cannot 
satisfactorily be modelled in terms of the propositional calculus. But the fact that a 
more powerful theory than Piaget’s own is needed to explain the underlying mechanism 
of his second tier account of formal operational thinking is no grounds for not using 
a sufficiently well-researched model. Although equivocal statements can be found 
in some of his many publications, Piaget (1972) makes clear that the essence of his 
work is the developmental classification (the second tier), by which a wide range of 
behaviours of a given individual can be related to the same overall construct. It is for 
this reason that the word * utility ° was used in the title of this article where ‘ validity ° 
might have been expected. In applied research the criterion has to be quantitative: 
is there sufficient predictive validity from Piagetian measures to, say, tests of the 
understanding of science, for the Piagetian paradigm to be of use? 


If there is no unitary construct underlying the various Piagetian measures, their 
predictive validity for any aspect of educational research could never be high. Thus 
the obvious experimental test is to take a representative range of different Piagetian 
problems, and find out to what extent specific performances on each of the tasks can 
be unified by the imposition upon them of Piaget's developmental hypothesis of stages. 
Yet as late as 1974, Blasi and Hoeffel could still, with reason, write: “‘ Oddly enough 
no reliable empirical data exist relevant to what now can be called the cognitive- 
developmental hypothesis of adolescence ". The small samples necessitated by the 
use of clinical interview, the global judgments in which the interview results were 
reported, and the lack of agreement between authors of different papers as to the level 
at which to categorise behaviours, all contributed to absence of solid data by which to 
decide for or against the Piagetian hypothesis. For example, conclusions suggested 
in Kuhn et al. (1977) about the ages of development of thinking in correlational 
problems, as distinguished from control of variables (exemplified by the Pendulum 
problem), depend upon the developmental levels ascribed. Yet the Inhelder/Piagetian 
levels were found by Lovell (1961) to be too high for the correlation problem. Like- 
wise, the notion that formal operational thinking in the Pendulum problem occurs a 
year or more earlier than in other Piagetian tasks such as Chemical Combinations is 
made explicitly by Rowell and Hoffmann (1975) and implicitly in Kuhn and Angelev 
(1976). Yet these conclusions depend upon the subject's strategy for the control of 
variables being categorised as the behavioural characteristic of formal thinking in the 
Pendulum problem. Once control of variables is shown to be only a transitional 
behaviour, these conclusions fail. This was found in a study of about 600 14-year-olds 
reported by Shayer (19782). The only way out of these difficulties is to produce 
group tests corresponding to the original Inhelder problems so that the behaviours 
involved can be related to each other by the usual psychometric techniques. The use 
of such group tests for producing Piagetian developmental norms for subjects in the 
age range 10 to 16 has already been reported (Shayer et al., 1976; Shayer and Wylam, 
1978). In this paper the development of four more group tests, and their use to test 
Piaget's developmental hypothesis will be described. 


METHOD 
Choice of tests 
In The Growth of Logical Thinking there are 15 problems which serve as opportu- 
nities for research. Piaget himself divides them into those involving propositional 
logic, and those involving operational schemata. Others classify them differently, and 
the choice was made both on the grounds of practicality, and of covering the different 
types of content. The three main types of intellectual content seemed to be (i) control 
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of variables and deduction of which variables are effective, (ii) equilibria in physical 
systems, (iii) combinations and correlations. In addition, two of the problems which 
might have been used in a group test-—-Communicating Vessels and Hydraulic Press— 
appeared to depend upon the subject having previously internalised the concept of 
pressure from formal science teaching. These were rejected in favour of problems in 
which the whole structure could be made accessible to the subject under the conditions 
of the test, and did not depend on specific pre-existing knowledge. The problems 
chosen for conversion to group test were Pendulum (Chapter 4), Flexible Rods 
(Chapter 3), Equilibrium in the Balance (Chapter 10), Inclined Plane (Chapter 12) 
and Chemical Combinations (Chapter 7). This was the maximum number which it 
was thought could be administered to a large sample across a wide ability range. 
Control of variables (and propositional logic) are covered by the first two; Equilibria 
(and operational schemata) are the content of the next two, and Combinations occurs 
in the last. 


The advantage of group tests is threefold. One can ensure that every subject is 
taken through the same adequate and representative set of demonstrations so that 
there is no doubt that a good chance to understand the problem has been given. 
Each of the behaviours described by Inhelder can be represented by test items, so 
that data analysis can examine the infra-structure of each task, and can relate the 
infra-structure of one task to that of another. Last, but not least, the combined 
experience of a team and many subjects can be built into the progressive improvement 
of each task as a research instrument. In addition, the use of demonstrations during 
the test gives the subject the kind of feedback which he might have given himself had 
he the apparatus. The development and validation of these tests is reported in 
Shayer (1978a). Further information is published in NFER (1979). 


Sample and gathering of data 

Although behaviours are listed at lower levels, the main development charted in 
The Growth of Logical Thinking is from Late Concrete (2B) to Late Formal (3B) 
operational thinking. For this test 14-year-olds were chosen, both because this was 
the maximum age (third years) to whom access can readily be obtained in secondary 
schools, and because the above range of development can be found amongst them. 
By calculation from the survey results published earlier (Shayer et al., 1976), it could 
be seen that the upper 40 per cent of 14-year-olds should give the range required. 
Although it was known in advance that there was no sex differential on the Pendulum 
task for people of this age (Shayer and Wylam, 1978), it could not be assumed that 
the same would apply to the four other Inhelder problems. It seemed best to test 
Piaget's construct on boys primarily, but to make sure that there were enough girls 
to enable comparative data to be collected and analysed. Bearing in mind that there 
were to be five separate tests and that, for factor analysis, it would be necessary for 
at least 200 subjects to have completed all of them, about 400 boys were aimed at, to 
allow for wastage. The staff in three boys’ grammar schools were asked to provide 
all their third year pupils (10 classes in all), and the staff in two comprehensive schools 
(9 and 10-form entry) agreed to allow the top four forms of each school to be tested. 
There was an equal number of girls and boys in the comprehensive schools. To this 
sample was added the third year of a four-form entry girls" grammar school to ensure 
that there were enough high ability girls for the necessary comparisons to be made. 
This gave a total sample of 647: 429 boys and 218 girls. 


Each of the tasks took about 50 minutes to administer to a class. The testing 
was done between the half-term in the spring term and the half-term in the summer 
term, 1978. Further to reduce the number of occasions in which the schools would 
need to be visited, two of the tasks (Equilibrium in the Balance and Inclined Plane) 
were administered in the same session, which took between 70 and 80 minutes in all. 


268 Formal Operational Thinking 


RESULTS 


In Table 1 the performance of the total sample is given on each of the five tasks. 
It can be seen that there is some variation from task to task in the proportion at the 
different sub-stages, but very little difference in the mean levels on the different tasks. 


Validity and reliability of measures 

Before proceeding to data analysis, it was necessary to check that the group tests 
did correspond to what was reported in The Growth of Logical Thinking, that group 
testing did not cause the subjects to think at a different level from individual interview, 
and that the group tests were precise enough in their assessment of the optimum level 
of thinking of the subjects. In other words, the content validity, concurrent validity 
and reliability of the tasks had to be estimated. This does not entail the assumption 
that the clinical interview estimate is necessarily the primary standard to which other 
measurements should be referred (Tuddenham, 1970). For each of the five tasks 
children were interviewed as described by Inhelder, but the categorisation of their 
behaviours both in interview and by group test was standardised, so that it was only 
the methods of testing which were being compared. For each of the tasks, behaviours 
were listed (see Shayer, 19782) corresponding to each of the sub-stages, as was done 
by Somerville (1974) for the Pendulum task. These are based strictly on the original 
Genevan work. Reliability of the group tests was estimated both by internal con- 
sistency analysis of the main sample results, and by separate determination of the 
test-retest correlations. The latter was done with a three-month gap between the two 
occasions of testing. Validity was investigated by interviewing children between a 
week and a month after they had received the group tests. In Table 2 the various 


TABLE 1 


PERCENTAGE OF SAMPLE (N ~ 550) AT DIFFERENT LEVELS ON TASKS 


Level 
2B- 2B 2B/3A. 3A 3B 
Late Late Mean* 
Task concrete formal Level 





Pendulum 43 18-4 28-0 276 21:8 3-44 
Equilibrium in the balance 49 18-9 21:8 36:6 17:8 3-44 
Inclined plane 78 20:3 20-7 38-5 127 3-28 
Chemical combinations 54 152 277 38:8 12.8 3-38 
Flexible rods 6:7 14:3 224 38-5 18-1 3-47 


* based on coding 2В- = 1 up to 3B = 5 


TABLE 2 
RELIABILITY AND VALIDITY DATA FOR FIVE FORMAL TASKS 








Internal Test-retest Group test/Interview 
Task consistency correlation (N = ) correlation (N= ) 
Pendulum 0-83 0-79 (24) 0-71 (24) 
Equilibrium in the balance 0-84 0:77 (31) 0-55 (18) 
Inclined plane 0-76 0:82 (33) 0-63 (15) 
Chemical combinations 0:76 0:64 28) 0:65 (23) 
Flexible rods 0-86 0-85 (38) 0-79 (23) 








Median 0-83 0-79 0-65 
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correlations are recorded. Internal consistency was calculated using the KR-20 
formula, and the other correlations were transformed, before tabulation, into an 
estimate of what they would have been if the sample variance had been the same as 
that on which the internal consistency was made. 


Before some of these data can be interpreted, it is necessary to check if there is 
any systematic difference between group test and individual interview. In Table 3 the 
mean sample levels are compared, as calculated on an assumed equal-interval scale 
running from below-late-concrete (2B~), through 2B = 2, 2B/3A = 3, 3A = 4 (early 
formal) to 3B = 5 (late formal). 

TABLE 3 


COMPARISON OF CLASS-TASK AND INDIVIDUAL INTERVIEW MEAN LEVELS 








Individual 

Task Group test interview Gain 
Pendulum 3:63 3-92 +0-29 
Equilibrium in the balance 322 3-78 +0°56 
Inclined plane 3-93 3-80 --0:13 
Chemical combinations 3-78 361 — 0:17 
Flexible rods 413 417 4-0-04 

Mean +012 


TABLE 4 


INTER-TASK CORRELATIONS. N = 481: SUBJECTS WHO COMPLETED ALL TASKS 





Tasks Pend EB IP cc 
Equilibrium in the balance 0:537 — — — 
Inclined plane 0:561 0:647 — — 
Chemical combinations 0-504 0-492 0:564 — 
Flexible rods 0-631 0:624 0-615 0-686 


The mean gain shows no systematic difference. Granted this conclusion, the 
fact that the mean test/interview correlation (0-67) is lower than that of test/retest 
(0-77; 2 = 1:63, P «0-05 one-tailed) supports the inference that there is some extra 
source of variation when subjects are interviewed, over and above the variability 
found when given the group test, which could well be the result of the person/person 
interaction in the interview. Although the interview/interview correlation was not 
determined, the most reliable estimate of this in the literature, that of Lawson et al. 
(1974) on the tasks Pendulum, Equilibrium in the Balance and Flexible Rods, was 
0-61. The mean test-retest correlation for these tasks (Table 2) was 0:80. Using 
Lawson et al.'s result for the interview/interview correlation, one could predict a 
value of (0-802 x 0-614)4 = 0-70 (assuming perfect attenuation-corrected correlation) 
for the task/interview correlation—very close to the mean value obtained (0-68). 


Factor analysis of five Piagetian tasks 

An initial minimum test of validity which the construct of formal operational 
thinking would have to pass would be that of factorial unity when each task is taken 
as an independent variable. Each subject's overall level of performance on each task 
was estimated for the five levels from 2B~ = 1 to ЭВ = 5. This was treated as ап 
equal-interval scale for the purpose of deriving an initial correlation matrix between 
the tasks. This matrix is given in Table 4. Despite some sex differentials it was 
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found that addition of the girls to the boys’ sample made little difference to the factor 
analysis, so the whole sample was used. 


The SPSS PA2 program was used for the factor analysis (Nie et al., 1975). One 
factor had an eigen-value of above unity (2:95) which explained 59-0 per cent of the 
variance. Factor loadings are shown in Table 5. 


These results, then, confirm the presence of a strong unitary construct underlying 
the five tasks. On the factor analysis model the total variance is partitioned between 
the common factor variance and the variance unique to each task. Since an estimate 
of the error variance is available from the test-retest reliabilities, it is then possible to 
estimate the proportion of the specific variance for each task to the common factor 


TABLE 5 


Facror ANALYSIS OF Five TASKS 


Factor 
Task loading Communality 
Pendulum 0-719 0-52 
Equilibrium in the balance 0-747 0:56 
Inclined plane 0-780 0-61 
Chemical combinations 0-727 0.53 
Flexible rods 0-858 0-74 
TABLE 6 


ESTIMATES OF SPECIFIC AND COMMON VARIANCE FOR FIVE TASKS 


Test- 





retest Specific 
Task Communality R variance 
Pendulum 0:52 0-78 0:26 
Equilibrium in the balance 0-56 0-77 0:21 
Inclined plane 0-61 0-81 0:20 
Chemical combinations 0:53 0-65 0-12 


Flexible rods 0:74 0:85 0-11 


variance. This is given in Table 6 which seems to justify the use of a unifying construct 
such as operational thinking to describe what these five tasks have in common. 

However, a much more powerful factorial test of the unity of Piaget’s construct 
would be to increase the number of variables and so decrease the likely strength of 
the first factor extracted. Because each test consisted of 14 to 20 items, the different 
schemas within each task could be identified with groups of items. Eleven variables 
were defined, and are listed in Table 7. Each variable consists of between three and 
nine items. As there are several overlaps from task to task (e.g., control of variables 
both in Pendulum and Flexible Rods), factorial analysis might produce separate 
factors for each of these schemas. Another objective of the analysis would be to 
check on Piaget’s interpretation of each task. In Chemical Combinations, for example, 
to what extent is the ability of a person to know theoretically all the combinations of 
four liquids related to his ability to reason successfully about the results obtained, and 
also to see the right experimental strategy for finding the role of each liquid? Since 
these two behaviours are separately represented by item-clusters, a quantitative answer 
should be possible. 
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TABLE 7 


FACTOR ANALYSIS OF 11 WITHIN-TASK CLUSTERS (UNROTATED) 








Factor 

Task Variable 1 2 Communality 
Control of variables 0:64 — 0-24 0-47 
Pendulum Deduction of effect 063 -0:22 0:45 

taim і oportionality . . "5 

Equilibrium in the balance — Equilibrium of system 0-63 0-28 0-47 
Proportionality 0-65 0:12 0:44 
Inclined plane Equilibrium of system 0:62 0:16 0-42. 
ЖӘЕ interpretation о e 0:32 
: фан ombinations d -0 0:11 
Chemical combinations Strategy and deduction 067 —0-14 0-47 
А ontrol of variables “74. -0-21 0:59 
Flexible rods Compensation 0-68 -011 0-48 


As before, the SPSS PA2 programme was used and again there was only one 
factor with an eigen value greater than unity (4-31 dropping to 0-45) which explained 
39-2 per cent of the variance. The loadings from the two factor solution are shown in 
Table 7, to show how the two tasks involving the equilibrium of physical systems are 
to some extent differentiated from the others which more obviously require logical 
reasoning. The division between the variance explained by these two factors indicates 
that the first factor is much the stronger, accounting for 91 per cent of the variance 
taken out by these first two factors. This result must be taken as much stronger 
evidence for the unitary nature of the group tests which were designed to measure 
Piaget's construct of formal operational thinking. 


However, the very low communality of the Combinations variable from the 
Chemical Combinations test, suggests that one of Piaget's insights is untenable. One 
of the implications of his meta-theory is that combinatorial thinking, as realised by 
the ability to know all the combinations of four or more objects, should be intimately 
related to other aspects of formal operational thinking. Yet the variable not only has 
a very low correlation with every other variable in the battery, but even with the rest 
of the Chemical Combinations task (strategy and deduction) it has the lowest 
correlation bar one of all the other variables. 


Evidence of décalage 

Much has been written about * décalage '—the apparent fact that subjects will 
perform at different Piagetian levels, depending on the type of problem which they 
are given. "Unfortunately the discussion is too often non-quantitative, and the 
problem is not differentiated from that of the reliability of the measures employed. 
Given the inevitable variation between different estimates of Piagetian level with a 
reliability of 0:8, a range of scores on the five tasks, (3B, ЗА, ЗА, ЗА, 2B/3A} for a 
subject is to be expected. Décalage certainly becomes a serious problem when a 
person is estimated at two distinct Piagetian stages by different tasks: that is, Late 
Formal (3B) by one, and Late Concrete (2B) or less by another. In Table 8 these 
disparities between tasks are indicated in terms of the percentage of individuals being 
classified with a difference of a whole stage or more on two tasks. 


The first figure is the percentage referred to the whole sample of about 520 
subjects having done both tasks. The figure below in brackets is the column per- 
centage, i.e., of those classified 3B by one of the tasks named in the columns, x per 
cent are classified 2B or below by the corresponding row task. It can be seen that 
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TABLE 8 


PERCENTAGE OF SUBJECTS CLASSED AT 3B LEVEL BY ONE TASK ASSESSED AT 2B OR 
BELOW BY ANOTHER TASK 


Level on first task 3B (N~ 520) 








Level on 

second task Task Pend EB IP cc FR 
Pendulum -- 0-8 0:4 0 0-5 

== (4:2) (2:9) (3:0) 

2B Equilibrium in the balance 0-8 — 0-2 0-8 0-8 
(3:5) = 04) (43) (43) 

or Inclined plane 1:5 0:2 — 0-8 11 
(68 (10) = (65) (6-5) 

2В- Chemical combinations 0-9 1-0 0-6 — 0-5 
(4:3) (5:3) (4-4) -- (3:1) 

Flexible rods 02 09 0:8 02 — 

(08 | ($53 (59 (А = 


the former average less than 1 per cent of the total sample. Of those assessed as 3B 
on any one task, an average of less than 4 per cent are assessed as non-Formal on any 
one other task. However, of those classified 2B/3A by at least three tasks, only 6 
per cent are assessed 3B by any other task, and of those subjects classified 2B by at 
least three tasks, none are assessed 3B by any other task. It seems, therefore, that the 
rather high probability of 0-04 that, given Late Formal performance on a task, a 
subject will show only Concrete performance on another should be read as a failure 
rate rather than a statistic about individual variation. It is the proportion of those 
who, for one reason or another, fail to find the key to solving a task for which their 
performance on another suggests they possess the competence. 


DISCUSSION 


It can now be argued that the confusion in the literature on formal operational 
thinking (some of which was reported in Brown and Desforges, 1977) is due more to 
the lack of a progressing tradition of research than to any deficiencies in the original 
Genevan work. By 1974, 19 years after the initial publication of The Growth of 
Logical Thinking, no one had produced work which would decide between the original 
assumption that all adolescents develop formal operational thinking, that between a 
quarter and a third do (Dulit, 1972), or even that maybe only 5 per cent of 14-year-olds 
do (Shayer, 1970). With limited samples rarely above 20 for each year of age by the 
exigencies of testing by interview, the wide range of published results (Blasi and 
Hoeffel, 1974) is no evidence necessarily for heterogeneity of performance, but may 
merely express sampling variation. If subsequent work had been as timely and 
thorough as that of Lovell (1961) the picture would be very different. 


Correlation evidence 

Strangely enough, Lovell’s paper is not mentioned in Brown and Desforges’ 
(1977) discussion of the definition of a stage. Instead, papers are cited for the formal 
operational stage which give the impression that correlations between tasks are of the 
order of 0-3 to 0-4. Yet Lovell reports, for the tasks Chemical Combinations, 
Pendulum, Hydraulic Press and Falling Bodies, a Kendall W of 0-89, from which one 
can calculate that the mean Spearman rho for this set is 0-85. So much of the 
information is retained in calculating rho that the numerical value can be taken as a 
good approximation to а Pearson r (Guilford, 1965, p. 229). Truly there are few 
correlations reported in the literature, but this only makes it the more important to 
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cite those which are there. Lee (1971) reports a value of 0-85 for Shadows and 
Equilibrium in the Balance. Bart (1971) gives a value of 0-73 for this correlation, and 
0-59 for Pendulum with Equilibrium in the Balance. The mean correlation for the 
five tasks used in the present study was 0-58. 


Are all these values incompatible? When correlation evidence is surveyed it is 
essential to relate it to the ranges of the samples used. For example, the lowest of 
the correlations from the Schwebel (1975) paper cited by Brown and Desforges is that 
of 0:30 for the tasks Equilibrium in the Balance and Inclined Plane, for men. The 
range is very restricted: 7 per cent of the men were assessed Late Concrete, 63 per cent 
Early Formal, and 30 per cent Late Formal. On the same equal-interval scale as 
used in the present study, the standard deviation is only 0-748. The CSMS! result for 
boys, for this pair of tasks, was r = 0-67, б = 1-06. When the Schwebel result is 
adjusted to match the CSMS by the formula which corrects for restriction of range of 
parallel tests, the correlation becomes 0-65. Likewise, in the Lovell paper, a mean 
rho as low as 0-36 is given for the correlations among the set Chemical Combinations, 
Hydraulic Press, Falling Bodies and Correlations. Yet this is for a group of prepara- 
tory school pupils, among whom c = 0-76. The previous group was a very wide age 
and ability range sample, for whom с = 1-595. When both these correlations are 
adjusted to match the average CSMS task standard deviation of 1-10, they become 
0-69 and 0-68 respectively. Thus the two results in the Lovell paper indicate the same 
degree of relationship despite the widely differing correlation values. In a repre- 
sentative sample of adolescents taken from one year group the task/task correlation 
is of the order of 0-6 or more and is over 0:8 if the sample also includes younger 
children whose thinking is restricted to the concrete operational stage. 


The problem of concordance and correlation 

A related source of confusion in the Piagetian literature (not confined to the 
formal stage) lies in the failure to differentiate between the concepts of the correlation 
of tests, and the concordance of results of individuals on different tests. This was 
gradually elucidated by Wohlwill in the sixties (Flavell and Wohlwill, 1969; Wohlwill, 
1973). In the first paper the authors were led to predict that individual variation on 
different tests will be a maximum at the early phase of development of a stage, and 
fall to a minimum in the final phase. In the second Wohlwill describes four different 
models for development, each of which should give a different correlation surface. 
On the Piagetian account of disequilibration, the model is one in which the surface is 
different from that of normal bivariate correlation. The normal surface is homeo- 
scedastic both in rows and columns, and across the diagonals. The Piagetian surface is 
nearly homeoscedastic in rows and colums, but is considerably ‘ waisted’ across the 
diagonals at the completionof astage. Thediagonalsof thesurface express the variation 
of the individuals’ scores on the two tests, in relation to their mean scores. As the indi- 
viduals enter anew stage (e.g., the formal stagefromthe concretestage) so their variation 
on different tasks becomes very random and large. As they develop their performance 
becomes more and more consistent across tasks. If a correlation coefficient is 
calculated for a sample who are mainly in the early phase of development of a stage, 
the value will be inflated by the wide range in the marginals for their levels. But ifa 
sample is taken from those who are at the completion stage, the calculated correlation 
will be much lowered by the absence of variation, even though most individuals are 
scored at the same level. This was illustrated above in the Schwebel sample. The 
concordance statistic required is that of the standard deviation of each subject's levels 
on the different tasks he has done. In a sample selected from the survey by Shayer 
et al. (1976), the root mean square value of the statistic among 282 subjects decreased 
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from 1-095 at mean level 2A (Early Concrete) through 0-688 at 2A/2B down to a 
minimum of 0-465 at mean level 2B (Late Concrete), and rose again to 0-966 at 2B/3A, 
the beginning of the development of the formal stage. Likewise in the present study 
(Shayer, 1978a) the concordance statistic showed a maximum, across the five tasks, 
at mean level 2B/3A, of 0-910, and dropped steadily as the mean level of the subjects 
rose towards Late Formal. Only in a sample drawn from a wide age-range spanning 
two major stages, as in Lovell (1961) and Lee (1971), will the correlation value express 
the degree of relationship between different tasks. Unfortunately the concordance 
concept required has not been investigated in the statistical literature, although there 
is little difficulty in principle (this corresponds to the distinction between R-technique 
and P-technique in the factor-analytical literature). 


Construct validity 

The issues relating to the status of Piaget’s work are well explored in an article 
by Cronbach and Meehl (1955). The purpose behind this discussion of validity was 
not to add one more exacting demand to test practice, but to head off an over- 
deductive approach to psychological theory. It is a defence of the inductive method 
in science. What does one have to do in validation of a theory in psychology to bring 
the method in line with that of the traditional sciences? In principle you have to 
start with ‘inductive summaries’, and test the underlying construct by factor- 
analytical and internal-consistency methods. Then, by a ‘ bootstraps’ procedure you 
work gradually, as in physics, to the point where a meta-theory will explain the initial 
construct. In chemistry the status and utility of the Periodic Table may be cited as 
an example of the first phase. Thomson’s (1897) original theory was brilliant and 
wrong; yet the Table is used every day by chemists still, despite the fact that Bohr was 
the first to produce a consistent underlying theory which ‘ explained ° its features. 
Many of the criticisms of Piaget's work read as though the authors demand either the 
final product of theoretical development, or nothing at all. 


The work reported here relates to the developmental construct, not to the symbolic 
logic meta-theory. The construct is that people's minds have reality-processing 
mechanisms whose operations on reality can be described. Atleast three hierarchical 
levels can be differentiated (pre-operational, concrete and formal operational), and 
they are achieved by each individual by his general process of interaction with the 
world. Further, it is possible to characterise each person's capacity in terms of a 
level at which he presently functions (at his best). The experimental evidence offered 
relates strictly to the utility of such a construct, and it is claimed that it is substantial. 


It might be thought that the conclusion to be drawn for fundamental research 
from the results offered in this paper could be clearly differentiated from that which 
could be suggested for applied research. The latter would appear to receive support 
from the empirical evidence: the former would receive further stimulus for work on 
the meta-theory from the validation of the formal construct. But on closer inspection 
it is found that this distinction vanishes. For applied research one needs, in addition 
to the ability to estimate the optimum Piagetian level for a person, to be able to 
estimate the level of thinking required to succeed at a certain level in a given task. 
For fundamental research one wishes to provide a meta-theory which explains the 
developmental construct. But as soon as one tries to do either (assuming that 
physiological access into the ‘ black-box’ concerned is not a possibility) one finds 
oneself in the same endeavour: that of trying to provide rules by which particular 
tasks can be assessed for their level of intellectual demand. It is in fact only by 
examining the products of the mind that one can study the mind: this much of the 
behaviourist position is true. Thus the slightly different attempts of both Pascual- 
Leone (1970), Pascual-Leone and Smith (1969) and Halford (1978) to * explain the 
appearances’ in terms of the chunks of information which can be related to each 
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other simultaneously in short-term memory, and Piaget’s symbolic logic and mathe- 
matics of sets meta-theory, both have to be judged by their power to analyse and 
predict a variety of tasks. The former are estimated empirically in subjects by 
measures such as the backward digit span from the WISC test (Scardamalia, 1977; 
Halford, 1978), the latter by Piagetian tests such as described in this article. But in 
both cases the utility only appears when the theory can independently estimate a scale 
of levels for new intellectual tasks. That this can be done for secondary school 
science activities by a taxonomical cross-referencing to Piaget’s work was demon- 
strated by Shayer (1978b). With the information-processing approach the difficulty 
appears as soon as the attempt is made to decide how many chunks of information 
are involved in solving any new problem: the heavy labour shown by Scardamalia 
in analysing the M demand of even a very simple combinatorial task shows that there 
is no case yet to be made for the wider power of this theory compared with that of 
Piaget. The interest of Halford’s work lies in the fact that for some areas of mathe- 
matical activity a method of analysis based on the number of operations and variables 
involved is sketched. But it is not yet in general form, and it is difficult to see what 
moves might be required, for example, in estimating the difficulty of * understanding ' 
the binomial theorem. What none of these authors (including Piaget) has provided 
is some way in which an alternative method of analysis can move beyond being a 
convenient rationalisation of what their own general experience and intuition has 
already estimated, and can work for others lacking the same expertise. From this 
point of view the conclusions of Brown and Desforges were a timely reminder of the 
existing confusion and a pointer toward work to be done. But it cannot be agreed 
that the search for generality is inappropriate. The reason that we need to produce 
taxonomies of behaviours for specific areas of the curriculum is precisely because 
much richer data-sources are required to inform the integrating theory which has yet 
to be created. Only if the production of empirical taxonomies is related to the 
development of underlying theory will the facts gathered be more than unrelated 
Baconian fragments. Why abandon an empirically validated theory-base where 
nothing else can yet be offered? 
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THE EDUCATIONAL UTILITY OF PIAGET: A REPLY 
TO SHAYER 


By C. DESFORGES AND G. BROWN 
(Department of Educational Research, University of Lancaster) 


SuMMARY. Shayer's article argues for the utility of Piagetian stages. We contend that 
the case has not been made. It is argued that the pursuit of accuracy in identifying 
Piagetian stages may be counterproductive to the more important task of providing 
appropriate curriculum sequences. 


FoR the past several years Shayer has worked on one of the most important, intractable 
and under-researched problems in educational psychology. The problem he has 
addressed is that of arranging instructional materials in an optimal learning order; 
in other words, the problem of matching tasks to the learner's attainments and 
abilities. Elsewhere һе has noted that ‘ matching has traditionally been something 
of an art based on accumulated professional experience. Не has noted that '* No 
learning model was used, but by implication it was one of accommodation by the 
student to the course " (Lovell and Shayer, 1978, p. 110). He has pointed out the 
deficiencies of such a tradition and we entirely agree with his comments in this respect. 


‚ In an attempt to optimise the sequence of teaching/learning experiences for 
pupils Shayer has turned to Piaget's theory of cognitive development. He has used 
it to analyse the conceptual demand of several contemporary science schemes (Shayer, 
1972, 1974) and endorsed its merits as a heuristic device for the design of subsequent 
schemes (Shayer, 1978). He has not been alone in this venture but his work is 
exemplary in its thoroughness and scholarly discipline. However, the attempt, in 
our view, is not worthwhile. 


The idea that all experience is not equally assimilable at all ages is an ancient 
one. The educational corollary that learning experiences should be matched to the 
child's developmental attainment is similarly familiar. Indeed in promulgating this 
view Piaget (1971) recognised that he was suggesting nothing new. What classical 
educators had failed to do was to establish a pertinent body of psychology necessary 
for working out educational techniques * truly adapted to the laws of mental develop- 
ment’ (Piaget, 1971, p. 143). It is Piaget’s view that psychologists’ right to a hearing 
in education must be based on their establishing such a body of scientifically valid 
knowledge. 


He further claims to have gone some way to establishing just such a body of 
valid knowledge. We have stated elsewhere our criticisms of this view (Brown and 
Desforges, 1977) and since then other extensive criticisms of the validity of 
Piaget's theory have been published (Brainerd, 1978; Brown and Desforges, in press; 
Donaldson, 1978; Siegel and Brainerd, 1978). 


Setting aside the contentious issue of the validity of Piaget's theory, Shayer seeks 
to establish the utility of some part of it, This makes sense, at least as a short term 
strategy. If Piaget's description of stage-related behaviours assists with the sequencing 
of curriculum materials why should these be rejected just because his theoretical 
account of those behaviours appears untestable or, where testable, unconfirmed ? 
In seeking utility, Shayer focuses on the ‘ developmental construct’ which asserts 
that *... people's minds have reality-processing mechanisms whose operations on 
reality can be described. At least three hierarchical levels can be differentiated . . . 
and...itis possible to characterise each person's capacity in terms of a level at 
which he presently functions at his best. More concretely this claim implies that 
performance on certain classical Piagetian tasks can be used to characterise a person's 
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developmental level. Other tasks (e.g., curriculum tasks) could be analysed to 
ascertain their intellectua] demands in similar terms. These latter tasks can then be 
matched to the developmental level of the learner. 


How would one assess the utility of such a notion? Clearly one would have to 
establish that a set of tasks could reliably determine a person's developmental level. 
Furthermore one would then have to establish that curriculum tasks could be analysed 
and ordered in the same terms. As Shayer notes, * . . . utility only appears when the 
theory can independently estimate a scale of levels for new intellectual tasks’. 
Finally, and most crucially, one would then have to establish that the derived ordering 
provided an optimal learning sequence. 


Whilst Shayer claims to have demonstrated the utility of the * developmental 
construct’ it is clear that he has done no such thing. He has not used it to order 
* new intellectual tasks ° nor has he assessed the effect in terms of learning outcomes 
of such an ordering. He has reported elsewhere on attempts to do so. With respect 
to a Nuffield Combined Science examination Lovell and Shayer (1978) assessed the 
developmental level of candidates and the cognitive demand of examination items in 
Piagetian terms. Facilities on the examination items were as follows: 


Level of pupil (N = 90) Level of examination item 
Stage 2B Stage ЗА Stage ЗВ 

Stage 3A 77 47 14 

Stage 2B 35 12 2 


(after Lovell and Shayer, 1978, p. 120) 


This clearly shows that as items get harder fewer pupils can tackle them successfully. 
It also shows that better pupils can tackle more items at all levels of difficulty than 
weaker pupils. That we can be no more specific than this is made clear when we 
notice that neither group of children could succeed on even half the items judged 
appropriate for them by preliminary Piagetian analysis. In the present article Shayer 
does not tackle this crucial aspect of utility. Instead he concentrates solely on 
demonstrating the reliability of performances on particular tasks. 


He has certainly established that five of Piaget’s classic tests of formal operations 
reliably evoke typical behaviours. This is certainly an essential step in establishing 
the utility of Shayer’s procedure. However, in itself it is not very interesting. His 
findings show that the tasks * hang together ’ in some way, but leave open the reason 
for that relationship. It might equally well be attributed to the effects of intelligence 
or science teaching as to the development of operational thinking. Indeed, the gist 
of a paper cited by Shayer (Piaget, 1972) seems to suggest that Piaget thinks so too. 
In this paper Piaget suggests that, unlike the period of concrete operations, there is a 
much greater diversity of aptitude with age during adolescence and that * our fourth 
period can no longer be characterised as a proper stage...’ (p. 9). 


Thus the inter-task correlations, although reasonably high, cannot readily be 
interpreted. Shayer seems to have forgotten that he was seeking to show the utility 
of these measures and to have slipped off into establishing their validity in terms of 
their unity of operational demand. If the inter-task correlations were low, but the 
tasks had individual utility, no doubt they would be used. But if they correlated 
perfectly would Shayer recommend that we dispense with four of them, without 
considering their contributions to the sequencing of curriculum tasks? It seems to 
us that using Piaget’s ‘ developmental construct’ for this purpose raises far more 
serious problems than merely utilitarian ones, however. 


First it demands the assumption that a sequence mapping on to natural develop- 
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ment is necessarily the best sequence. This, as Brainerd. (1978) has recently pointed 
out, is a form of the ‘ Mother Nature knows best’-hypothesis. Yet as we have 
elaborated elsewhere (Brown and Desforges, in press), Mother Nature is not entirely 
reliable in this respect. For example, there is ample evidence of large sections of 
populations apparently failing to reach the level of concrete operations which Piaget 
considers necessary for rational thought. 


Next, it does not encourage people to operate optimally. In the formal opera- 
tional context for example, it has been shown that providing subjects with even 
minimal instruction or minor hints radically improves their capacity to succeed on 
* formal operational' tasks (Siegler and Liebert, 1975 Siegler ef al., 1975; Danner 
and Day, 1977). It thus seems foolish to delay tasks demanding formal operations 
when they can so readily be evoked. Third, it assumes that the lack of requisite 
operations indicated by failure on Piagetian tasks will cause failure in learning if the 
curriculum items are of a corresponding level. The evidence suggests that this is not 
so. It has been shown, for example, that of a sample of female university students 
who were coping satisfactorily with first year undergraduate maths, 61 per cent did 
not conserve volume (Towler and Wheatley, 1971). Rowell and Renner (1976) 
reported that 18 per cent of 138 postgraduate students failed tests of conservation of 
volume, and Schwebel (1975) noted that 4 Two subjects with approximately equivalent 
high school rank . . . perform as differently as this: Given a balance scale and freedom 
to experiment with it at will, one will be unable even to achieve balance by use of 
unequal weights, while the other, after experimenting, will propose a general rule 
enabling him to predict equilibrium under varying circumstances ” (pp. 139-140). 


Whilst admitting the circumstantial nature of this evidence, and acknowledging 
that these three examples could be explained by the placement of students (including 
postgraduates) in grossly inappropriate learning situations, such explanations are far 
from persuasive. 


It seems that if a curriculum were to be ordered in terms of cognitive development, 
this would be based on assumptions that were either untenable or unnecessary. In 
our view the whole discussion of stages is only tangential to the main educational 
issue. The crucial problem of curriculum sequence is unresolved. It surely makes 
more sense to tackle the problem directly and explore why children fail with particular 
materials and succeed with others. This research would demand a large scale 
programme in which psychologists worked with teachers using a wide range of clinical, 
experimental and survey techniques to identify the strategies of successful progress 
and the blockages of particular learners on various types of materials. What are 
required are taxonomies of pupils’ behaviours with real curriculum materials from 
the direct study of learning in classrooms. It is counterproductive to use procedures 
which help us to estimate how long we must wait before beginning to teach something. 
More valuable would be procedures which help us to bring forward the point of 
learning. 
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A FINAL COMMENT BY SHAYER 


This paper set out to test the premise involved in the implication, “ If there is no 
unitary construct underlying the various measures, their predictive validity could 
never be high." ‘ Utility ’ was given a quantitative meaning in relation to this premise 
only. Brown and Desforges examined exactly the same assumption in their 1977 
paper, and reasonably asked whether the concepts of ‘ equilibration’ and the 
‘hierarchically organised epigenetic system ” are based on data or whether they are 
merely based on ' epistemological conviction’. In my paper the case is made (a) 
negatively, by showing that their use of the literature had been selective, and (5) 
positively, by producing some of the fresh quantitative data which the literature 
required if it were to be other than equivocal. This case Brown and Desforges 
concede by offering no defence. Instead they offer a fresh and different line of 
argument, and this requires a brief comment. 


When did you stop beating your wife? 

There was no mention in their 1977 paper of the work of those who have addressed 
the problem of matching tasks to learners’ attainments or abilities—for example that 
of Karplus in relation to the SCIS course, or of Harlen in the Science 5-13 materials, 
or of Dale and Tisher for the ASEP course, or, indeed, the work of Shayer. Nor did 
there need be: attention to content and construct validity and the reliability of 
measures is logically prior to work on the predictive validity of a theory. But it is 
Вау reasonable to make ће absence of the latter a ground of criticism of work on 
the former. 


However, evidence on predictive validity has already been published. The book 
quoted (Lovell and Shayer, 1978, p. 120) was a secondary source, in which the original 
data had been collapsed for reasons of space. The primary source is the Shayer (1978) 
cited, and to this the reader is referred for a demonstration of the utility of the 
developmental construct, in the sense of predictive validity. Only two points will be 
made here. The correlation between the Piagetian group-test estimate of pupils’ 
cognitive level, and a test of their understanding of two months’ work in science was 
0-77 for one section of Nuffield Combined Science, and 0-78 for another section. 
This seems to me to be high enough to be of use. Below is given a table of the facili- 
ties obtained by one of the forms of pupils whose results contribute to the table 
which Brown and Desforges cite. The items all test aspects of physics, chemistry and 
biology analysed as making an Early Formal (3A) level of demand. 
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TABLE 1 


FACILITY OF ЗА ITEMS TESTING COMBINED SCIENCE SECTION 6 FOR 
Two GROUPS OF PUPILS 


Item numbers 





Pupil level 1 14 If 1h 1i 2h 3e 3f 3h 5d 6i 6j 
Stage 3A (N = 8) 68 38 63 63 25 0 83 75 13 25 38 63 
Stage 2B (N = 10) о 02020 0 10 20 20 10 0 030 


Allowing for the usual imperfections of existence, most teachers would feel that 
the sample of activities which the items represent were well-chosen for the 3A pupils, 
and almost completely unsuitable for the 2B (Late Concrete) pupils. This degree of 
prediction supports substantially the theoretical model used. 


"This is as far as disagreement goes. I do not regard Piaget's as a learning theory, 
and believe only that learning sequences must avoid gross mismatches between pupil 
and course. Since most existing science and maths courses are grossly mismatched 
to the majority of pupils, there is plenty of work to be done in this line first. But 
when it comes to the fine details of the formative evaluation of curricula, I entirely 
agree with the suggestions made for a variety of empirical approaches. I would only 
insist that a useful theory can effect an immense economy of research effort. Such 
further research is discussed in Shayer (1979). 


Lastly, may J make a plea? In the world of chemistry, it was about 1890 that 
the Chemical Society made a ruling that in future no papers would be published in 
their journal in which an argument was not established by fresh data. Perhaps it is 
too early to apply this principle throughout educational research. But perhaps a 
moratorium might be called on articles of comment and opinion in the area of 
Piagetian studies. However wrong-headed one may believe a theoretical approach to 
be, it is only in the design and operation of a crucial test that one gains insight into 
the real nature of the problem, and from such tests something new will usually be 
reported. It also alters the respect one bears to one's opponents. This area has seen 
too much heat, and too little fresh light. 
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EFFECTS OF ILLUSTRATIONS ON EARLY ORAL 
READING ACCURACY, STRATEGIES AND 
COMPREHENSION 


By D. R. DONALD 
(Faculty of Education, University of Cape Town, South Africa) 


Summary. Previous conclusions regarding the negative effects of illustration on 
learning to read are challenged. The view that reading is a linguistically and contextually 
constrained process demands that the effects of illustration be evaluated in this context 
and not only on isolated word recognition. Clarity is also sought on the issue of 
comprehension. For 20 average, second-year readers, reading continuous text with and 
without illustration, it was found that illustration: (a) facilitated contextual word 
recognition accuracy, strategies reflecting use of contextual cues, the self-monitoring 
strategy of self-correction, and comprehension at the level of literal idea recall; (b) 
interfered with the use of graphic cues; and (c) had no effect on comprehension at the 
level of inference beyond literal content. 


INTRODUCTION 


Ir is generally accepted that children enjoy reading illustrated text: that illustrations 
have a clear motivational function in early reading material. This is supported by 
studies recording children’s choice of reading material—at least in the early stages of 
learning to read (Miller, 1938). Whether illustrations, cognitively, facilitate the 
process of learning to read is, by contrast, an unresolved question. It is not that the 
issue has been unresearched: rather, the conclusions which may be drawn from 
previous research are not only, in many cases, in apparent conflict but the issue itself 
has been insufficiently clarified. 


Samuels (1970), for instance, reviewed a considerable body of research pertaining 
to the effects of illustrations on learning to read. His conclusions were as follows: 


“1. The bulk of the research findings on the effects of pictures on acquisition of 
a sight vocabulary was that pictures interfere with learning to read. 

2. There was almost unanimous agreement that pictures, qos used as adjuncts 
to the printed text, do not facilitate comprehension ” (p. 405). 


These conclusions, ostensibly, resolve the issue. Illustrations, at best, have no effect 
on comprehension and, at worst, they actually interfere with learning to read. Is the 
issue, however, as simple as this? 


Where learning to read itself is concerned, Samuel’s first conclusion needs care- 
ful qualification. The studies he quotes in support of this conclusion were all based 
on children’s recognition of isolated words—words out of continuous context, Even 
in his own study (1967) which attempted to use “а procedure which was similar 
to that used in actual classrooms ” (p. 338), and where children read an illustrated/ 
unillustrated continuous passage, the dependent variable was whether words included 
in the text were recognised when presented in isolation after the reading. His 
theoretical standpoint is that illustrations compete for limited attention capacity and 
therefore, through the ‘ principle of least effort’, lead to the learning of non-specific 
and often ‘ wrong’ responses. There can be no argument with either the evidence or 
the interpretation where paired-associate learning is concerned: in so far as learning 
‘ sight words ' can be regarded as learning to read, illustrations do interfere with the 
process. Keir (1970), from a practical point of view, supports this through her 
observations of the essentially confusing nature of children's illustrated dictionaries. 
Valuable as this conclusion is, there is a danger in over-generalising its import. 


282 


D. R. DONALD 283 


Learning to read involves a great deal more than learning correct responses to sight 
words. . The current emphasis on the linguistic, constructive nature of the reading 
process (Goodman, 1967; Smith, 1971; Gibson and Levin, 1975) highlights the 
importance of contextual constraints in this process: the information available in 
continuous text consists not only of graphic stimuli but also of rules of linguistic 
prediction in the orthographic, semantic and syntactic dimensions. Thus learning to 
read is more a matter of learning strategies of information processing than of learning 
specific responses to specific perceptual stimuli. The question of what effects illustra- 
tions have on learning to read, therefore, must be asked in the context of reading 
continuous text. This is particularly true since the normal function of illustrations 
in continuous text is not to cue specific words (as in the paired associate situation) 
but to relate to the general meaning context (Reid, 1976). Further, the measure of 
influence of an illustration cannot be on words recognised out of context (as in 
Samuels, 1967), since to do so is to exclude the very effect which an illustration might 
positively exercise. Clearly, the most relevant behaviour to consider is that which 
occurs during the actual reading of the illustrated/unillustrated continuous text. 


Where comprehension is concerned a number of more recent studies have shown 
that, under certain conditions, comprehension may indeed be facilitated through the 
presence of illustrations (Ketcham and Heath, 1962; Dwyer, 1970; Peeck, 1974). 
These studies, essentially, have shown that the effect of illustration will vary with: (a) 
the nature of the comprehension task; and (6) the relationship of relevance which 
exists between the text and the illustration. Indeed, a detailed analysis of the studies 
quoted by Samuels (1970) in support of his second conclusion reveals the same basic 
qualifications. The two key studies by Vernon (1953, 1954) are clear examples. She 
showed (1953) that although there was no overall difference between ‘ points 
remembered ° with and without illustration, there was a significant difference between 
the recall of ‘ major points’ directly illustrated and those same points unillustrated. 
Equally, Vernon (1954) noted that the amount recalled, where illustrations were 
serially ordered and closely relevant to the text, was consistently higher than where 
pictures were random, or only marginally relevant to the text. 


The aims, therefore, of the present study were seen as follows: first, the effect of 
illustration on the reading of continuous text required investigation, As pointed out, 
conclusions drawn on the basis of isolated word recognition may not apply to the 
skills required in reading contextually. Word recognition accuracy, in context, 
therefore, was selected as the most appropriate criterion measure. The effectiveness 
of the reader’s strategies in learning to read was also regarded as important. Two 
areas of strategy were selected: (a) linguistic cue-selection strategy, and (b) self- 
monitoring strategy. The former reflects the extent to which the reader makes use 
of graphic, syntactic and semantic cues in the information available to him. Essenti- 
ally, reading errors are analysed not as ‘ wrong responses’ but as responses which 
may be partially correct. Such analyses reveal characteristic patterns of response: 
the reader's strategy in attempting to make use of, and compromise, the three 
redundant sources of information. This concept of strategy is now well documented 
(Weber, 1968; Biemiller, 1970; Goodman, 1973; Cohen, 1975; Burke, 1976). 
Self-monitoring strategy reflects the extent to which the reader recognises that he has 
misconstrued the information available and is able to re-construct his response. 
Actual, spontaneous self-corrections in the course of reading are taken as evidence of 
such a strategy and have been shown to be related to the optimal development of 
reading competence (Clay, 1969; Weber, 1970b; Cohen, 1975). 


On the grounds that the information present in a relevant illustration provides 
an enhanced contextual set for the reader, it was predicted that both accuracy and 
strategies reflecting the use of semantic and syntactic information would be facilitated 
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by the presence of illustration. Equally since the information available in an illustra- 
tion may be regarded as complementary to—redundant within—the information 
structure of the text, it was predicted that it would act as an additional source of 
information check: that a self-monitoring strategy would be facilitated through the 
presence of illustration. Conversely, since an inverse relationship between the use of 
contextual (semantic and syntactic) information and graphic information appears to 
exist (Tulving and Gold, 1963; Weber, 1970a) it was predicted that the presence of 
illustration would reduce the use of graphic information. 


The second aim was to clarify the effects of illustration on different types of 
comprehension task. Two levels of comprehension (Clymer, 1972) were selected as 
particularly relevant within the methodology of early reading instruction: (a) recall 
of the main ideas in the text; and (b) inference beyond the literal content of the text. 
On the grounds that the processing of information in an illustration is essentially 
organised spatially, through parallel association rather than through logical, sequential 
processing (Paivio, 1971), it was predicted that information in an illustration, through 
being integrated with the literal or * face-value ' content of the text, would facilitate 
the recall of main ideas. Conversely, inference, which essentially requires logical, 
sequential processing beyond the literal content of the text, would be unaffected by 
the presence of illustration. 


METHOD 

Design 

Owing to the variety of dependent variables under consideration, a conventional 
matched samples design was rejected in favour of a latin-square design. This involved 
two randomly selected groups each reading two matched but different stories, A and 
B; the first group reading story A with illustration and story B without, the second 
group reading story B with illustration and story À without. The results for each 
dependent variable were subjected to a two-factor analysis of variance with repeated 
measures. The within subjects main effect, over-illustrated/unillustrated reading, 
was of principal interest. The interaction, however, in so far as it might reflect 
variance specific to the relationship of the particular story and the illustration 
condition was of equal relevance to interpretation. 


Sample 

A sample of 20 children was randomly selected from a population of average, 
second year readers, ranging in age from 6:10 to 7:4, in three schools in Exeter. 
* Average" was defined, in the first place, on the rough criterion of falling within the 
middle 50 per cent of reading achievement for that year group. Since variations 
between schools could be expected on this criterion, a further screen in which only 
those scoring, on accuracy, within one standard deviation of the mean for the whole 
group on a parallel form of the stories to be read were included. The sample of 20 
were then randomly allocated within the two reading groups. 


Materials 

The materials consisted of two stories and two illustrations mounted on white 
card, 24x15 cm. The stories were structurally matched in so far as each was 66 
words long; the content consisted of an every-day narrative account; the readability 
index (Spache, 1953) was 1-9; simple, active syntactic structure alone was used; and 
one sentence per line, plus uniform print size, etc., was used. The illustrations were 
clear line drawings in pen and ink and each illustration was required, in terms of 
relevance, to depict the central event in the story, as well as to represent the central 
agents and physical setting of the story. Relevance was also prescribed in the ratio 
of length of text to illustration, in each case, 66:1. 
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Procedure 

The order of presentation, story A first or story B, was randomly determined 
for each child. Tape recordings of oral reading and comprehension responses were 
made and subsequently transcribed for scoring and analysis. Scoring of the seven 
dependent variables was, briefly, as follows: 

Accuracy. Each misinterpretation of the text, at the whole word level, con- 
ventionally excluding self-corrected responses and repetitions, was counted as one 
error. The remaining correct responses constituted the percentage accuracy. 

Syntactic Acceptability. Басһ error (including responses subsequently self- 
corrected as well as repetitions—all of which were taken as legitimate indicators of 
strategy) was scored on a five-point scale for its degree of syntactic acceptability 
within the cumulative linguistic structure of the text. 

Criteria and scores for the relative syntactic acceptability of errors were: 

4: Acceptable within both the full sentence and the passage as a whole. 
hurt 
lost 
3: Acceptable within the full sentence, but not the passage as a whole. 


e.g., She was and lonely. 


is 
e.g, She Su: lost and lonely. 


2: Acceptable in the sentence, to the point of the error only. 
came . 


was 


1: Acceptable only in that it has the same grammatical function as the stimulus 
word. 





e.g., She lost and lonely. 


e.g., She Were lost and lonely. 
was 


0: None of the above. 


The total syntactic acceptability score, based on all the subject’s errors in any one 
condition, was expressed as a percentage of the total possible score for those errors. 
The final percentage reflected, therefore, the ‘ average’ syntactic acceptability of a 
particular set of errors. The same applied to semantic and graphic acceptability. 
Over all conditions and all subjects the total number of errors analysed on each 
acceptability scale was 479: a mean of 12 errors per subject per analysis with a range 
limited by the initial screening. In terms of precedent (Burke, 1976; Hood and 
Kendall, 1975) this could be regarded as an acceptable minimum but, in so far as the 
number of errors per analysis does affect reliability, some reservation is called for. 


Semantic Acceptability. Each error was scored on a five-point scale for its degree 
of semantic acceptability within the cumulative meaning of the text. Criteria and 
scores for the relative semantic acceptability of errors were: 


4: Fully synonymous with textual meaning. 


doesn’t tell her. 
does not 


3: Marginal but non-substantive change in meaning. 
house 


home’ 
2: Meaning acceptable to the cumulative sequence of ideas. 


ribbon р 
paper round the box. (Story of a party and getting a 


e.g., Father 


e.g., Now she stays in our 





e.g., There is pretty 





present.) 
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1: Meaning having a loose association with general textual content. 
— 
e.g., There is pretty Ê pap 





= round the box. 
0: None of the above. | 


Graphic Acceptability. Each error was scored on а five-point scale for its degree 
of graphic acceptability to the stimulus word. Criteria and scores for the relative 
graphic acceptability of errors were: 


4: Total correspondence. 
e.g., heard/heard (as in beard) 


3: Differing by only one letter (including reversals). 
e.g., stays/says or was/saw 

2: Having first and second or first and last letter in common. 
e.g., until/under or stands/slips 


1: Having only first letter in common. 
e.g., table/top 


0: None of the above. . 


Note: In all three of the above scales specific criteria for scoring ‘ exceptional ' 
errors including refusals, omissions, insertions, nonsense words, errors at the beginning 
of a sentence, punctuation errors, etc., were also specified in detail in order to maximise 
reliability of. Scoring. 


Self-Correction. The number of spontaneous self-corrections of whole word 
errors was expressed as a percentage of the total number of errors. 


Comprehension—Idea Recall. The structure of the text allowed each sentence 
to represent one idea. The child’s spontaneous recall of the story was analysed into 
corresponding units each of which was scored on a four point scale, the criteria of 
which were: 


3: Identical or synonymous with text idea. 
e.g., I gave her something to drink. (text) l 
* He brought the cat a drink of something.’ (response) 
2: All essential elements of text idea present. 1 
e.g., ‘Не put a bow! of milk down for her °. 
1: Some element(s) of text idea present. | 
e.g., ‘He gave her some food’ 


0: Irrelevant or incorrect. 
e.g., “Не gave the cat a pat’. 


Note: Examples given are deliberately * borderline ': most responses were, in 
fact, more easily classifiable. 


The total score was expressed as a percentage of the total possible score. 


Comprehension—Inference. Three questions of the order * Why – ~ -’ or ‘ How 
do you know — — —' followed each story. Responses to these questions were scored 
as 3, 1 or 0 depending on the relevance of the response. Total score was again 
expressed as a percentage of total possible score. 


RESULTS | 
Accuracy. Over combined groups the mean accuracy for the illustrated condition 
was 86-67 per cent and, for the unillustrated condition, 82-73 per cent. This difference 
was statistically significant (F = 6:14, Р < 0-025). The difference between groups was 
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non-significant (F = 0:01). An interaction (F = 5:30, P < 0:05), however, was present. 
Examination of the component means suggests two alternative interpretations for ће 
interaction (Table 1). 

TABLE 1 


CELL MEANS FOR PERCENTAGE ÁCCURACY 


Unillustrated | Illustrated 


Group 1 Story B Story À 
84-39 84-70 

Group 2 Story A Story B 
81-06 88:64 


One possibility is that Group 2 was affected by illustration while Group 1 was not. 
This seems unlikely in terms of both the random selection of groups and the other 
results. The other possibility is that accuracy was as much affected by the particular 
story as by the presence of illustration. Story B, in other words, was read with greater 
accuracy than story A; but the reading of both was facilitated through the presence 
of illustration. This і.е most likely interpretation. It also preserves the significance 
of the main effect. 


Syntactic Acceptability. Over combined groups the mean syntactic acceptability 
for the illustrated condition was 51:49 per cent and, for the unillustrated condition, 
37-46 per cent. This difference was statistically significant (F = 7:33, P « 0:025). 
Neither the difference between groups nor the interaction were significant, thus 
leaving the interpretation of the main effect clear. Illustration facilitates the use of 
syntactic cues. 


Semantic Acceptability. Over combined groups the mean semantic acceptability 
for the illustrated condition was 44-24 per cent and, for the urillustrated condition, 
27-33 per cent. This difference was statistically significant (F = 16:02, P<0-001). 
Neither the difference between groups nor the interaction were significant, leaving the 
interpretation again clear. Illustration, quite dramatically, facilitates the use of 
semantic cues. 


Graphic Acceptability. Over combined groups the mean graphic acceptability 
for the illustrated condition was 35-25 per cent and, for the unillustrated condition, 
39-95 per cent. This difference was statistically significant at the P<0-05 level 
(F = 4-46), The other effects, again, were non-significant and interpretation of the 
predicted reverse effect of illustration on graphic acceptability is clear. 


Self-Corrections. Over combined groups the mean self correction rate for the 
illustrated condition was 22-22 per cent and, for the unillustrated condition, 10:49 
per cent. This difference was statistically significant (F = 13:07, Р<001). The 
other two effects were both non-significant leaving a clear interpretation of the 
facilitatory effects of illustration on this key self-monitoring strategy. 


Comprehension-Idea Recall. Over combined groups the mean idea recall for the 
illustrated condition was 26:39 рег cent and, for the wunillustrated condition, 18-75 
per cent. This difference was statistically significant (F = 9-85, P<0-01). Neither of 
the other effects was significant, yielding, once more, a clear interpretation of the 
facilitating effects of illustration on this level of comprehension. 


Comprehension—Inference. Over combined groups the mean inference score for 
the illustrated condition was 68-33 per cent and, for the unillustrated condition 53-33 
per cent. This difference was not statistically significant (F = 3-28), neither were the 
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other two effects. This supports the prediction that comprehension at the level of 
inference is relatively unaffected by illustration. 


DISCUSSION 


In the light of previous conclusions regarding the effects of illustrations on 
learning to read, these results present an interesting array. First, although isolated 
word recognition may be adversely affected by illustration, word recognition in 
context, would appear, by contrast, to be facilitated. In relation to this, the inverse 
pattern of the linguistic cue selection strategies (semantic and syntactic vs. graphic) 
is particularly interesting. It would appear that contrary to common assumption, 
a child may read with greater accuracy through maximising his use of contextual 
information at the cost of some graphic information. This is in accord with the 
efficiency of information processing model of reading where only those cues necessary 
to uncertainty reduction are selected from the redundant sources of information 
(Smith, 1971). As the results suggest, illustration serves particularly to re-inforce the 
semantic constraints in the text and thus to facilitate the efficient use of contextual 
information. Significantly, however, the inverse role of illustration on the use of 
graphic cues is also in accord with Samuels’ (1970) interpretation of the interfering 
effect of illustration where isolated word recognition is concerned. . 


Context cues are not available in isolated words, so the child must depend оп 
graphic cues. In this situation the illustration acts as a distractor from the only 
reliable source of information. In contextual reading, on the other hand, the illus- 
tration distracts from graphic information in the same way, but actually complements, 
and facilitates the use of, alternative (redundant) contextual information. 


Nevertheless, the argument that loss of attention to graphic cues—through the 
influence of illustration—may, of itself, encourage a maladaptive reading strategy 
should be considered. It is possible. However, accuracy, it will be recalled, is 
facilitated: more words are, in fact, correctly read. It follows that the correct 
graphic correspondence rules are more likely to be acquired under these conditions 
even if less graphic cues are actively used to decode words. Moreover, the results 
indicate that a self-monitoring strategy (self-correction) is also facilitated by illus- 
tration, Since this strategy has been shown to be related to optimal reading develop- 
ment, it is likely to counter any possible danger in the under-use of graphic cues. 
Further, the finding that comprehension—the ultimate criterion of reading competence 
—4s facilitated on the level of idea recall, and at least not adversely affected on the 
level of inference, would suggest that however illustration might affect attention to 
graphic cues, understanding of the text is certainly not adversely affected. 


Despite the general facilitatory effects of illustration which these results appear 
to indicate, certain constraints need to be placed on interpretation. First, this study 
is relatively small in terms of both numbers of subjects and numbers of errors for 
analysis. The conclusions, therefore, are tentative and, since the issue has important 
implications, replication is perhaps desirable. 


It is clear, second, that the relationship of relevance between text and illustration 
is delicate. More research is required to clarify the critical variables in this relation- 
ship, and the conclusions drawn from this study are limited to relevant illustrations— 
as defined: they cannot be generalised to all illustrations. Indeed, if some of the 
care that goes into structuring the readability of the text in certain reading schemes 
were also to go into structuring relevance between text and illustration, considerable 
benefit might accrue to the learners. Third, this study focused on second-year, 
average readers. There is evidence that strategies not only change developmentally 
(Biemiller, 1970; Cohen, 1975) but that they also differ for good and poor readers 
(Clay, 1969; Cohen, 1975). It is highly likely that illustration would have different 


D. В. DONALD 289 


effects on both strategy and performance at different levels both of reading develop- 
ment and of competence. This also requires investigation and limits interpretation 
of the present results. Finally in practical, remedial terms the results also suggest 
that a child who tends to over use contextual information—‘ the wild guesser '—may 
not benefit from the presence of illustration. For such a child a strategy of greater 
attention to graphic cues would clearly be necessary and the removal of illustrations 
indicated. 


In conclusion, the effects of illustration on learning to read are complex and, as 
yet, far from explicitly clarified. What is clear, however, is that further research in 
this area must investigate in the context of continuous prose, and must guard rigorously 
against over-generalising its conclusions. 
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THE INTERPRETATION OF PRONOMINAL REFERENCE 
BY RETARDED AND NORMAL READERS 


By B. W. J. DALGLEISH AND SUSAN ENKELMANN 
(Department of Psychology, University of Queensland, Australia) 


Summary. Retarded and normal readers were required to interpret two clause sentences 
in which the pronominal subject of one clause agreed with the named subject of the other. 
If the pronominal subject appears in the main clause, the subjects cannot have the same 
identity. If it is in the subordinate clause, the subjects may have the same or different 
identities. Some sentences followed an indirect speech format in which the subordinate 
clause is usually preceded by the main; others contained subordinate clauses headed by 
an adverb, which are not fixed in relation to the main clause. A simple order based 
solution sometimes adopted by young children avoids the main and subordinate dis- 
tinction: ifthe pronominal subject precedes the named subject its reference is restricted to 
a different identity (ID2). Theretarded subjects produced significantly fewer ID2 interpre- 
tations when these were obligatory, confirming that syntactic problems may be implicated 
in reading retardation. The group x set interaction was not significant, although the 
normal readers were too advanced to apply the order strategy. Since the retarded 
readers did not spontaneously apply the order strategy the possibility of teaching it 
where the main: subordinate distinction is too difficult to operationalise is raised.j 


INTRODUCTION 


THE search for correlates of severe reading disabilities among oral language skills has 
been prompted by reports of language deficits in children of normal intelligence who 
enter remedial reading programmes, e.g., Klasen (1972). The most intensive study of 
the oral syntactic abilities of dyslexic children was conducted by Vogel (1975), whose 
test battery comprised nine syntactic, six reading, one receptive vocabulary and two 
auditory memory measures. The syntactic battery included specially devised experi- 
mental tests, recognition of melody pattern, recognition of grammaticality, sentence 
repetition and two sentence completion tasks. There were also two morphology 
measures from the Berry and Talbott Language Tests (1966) and the Illinois Test of 
Psycholinguistic Abilities (Kirk et al., 1968), Lee and Canter's (1971) Developmental 
Sentence Scores and a comprehension measure from the Northwestern Syntax 
Screening Test (Lee, 1969). 


For seven of the syntactic tests Vogel found impaired performance among the 
dyslexic children. The test of recognition of grammaticality, an ad hoc device 
concerned with rather literary aspects of prescriptive grammar, absence of which 
would not necessarily be expected seriously to impair comprehension, failed to 
discriminate between dyslexic and normal readers. Less understandable, in view of 
the retardation of so many aspects of language structure, was the lack of discrimination 
by the NSST comprehension scale, although it may be noted that Vogel's combined 
group mean was approximately one standard deviation below the manual norms 
provided by Lee (1969). 


The present experiment was designed to test comprehension across major 
segmental boundaries, an aspect of syntax raised by generative grammars (e.g., 
Chomsky, 1969), and to relate the comprehension problem to other correlates of 
M retardation by choosing a problem in which ordering strategies may be 
adopted. 


Resolving the reference of pronouns is one of the most ubiquitous and complex 
tasks facing the listener, and consequently has received much attention from linguists, 
e.g., Hobbs (1978), Dik (1968). The problems selected for study followed those first 
articulated in a behavioural context by Chomsky (1969). Two clause sentences 
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comprising a main and a subordinate clause may include subject noun phrases in 
both clauses. If one subject is a pronoun which matches the remaining subject in 
number and gender the listener must identify their referents as the same (101 response) 
or different (ID2 response). The rules of English syntax allow either identification if 
the subject of both clauses is pronominal (He thought he was ill), or if the subject of 
the subordinate clause is a pronoun and the subject of the main clause a noun (John 
thought he was Ш). Occupancy of both subject positions by the same noun is 
permissible only if, coincidently, two identities share the same name or common 
class label. Even if these conditions are met, speakers normally avoid the clumsy 
sentences which result. If the identities are the same, at least the subject of the 
subordinate clause must be pronominalised. When the subject of an anterior main 
clause is pronominalised and the subject of the subordinate clause not, the reference 


TABLE 1 


TYPES OF PRONOMINAL REFERENCE PROBLEMS 


Sentence Reference permitted 





Indirect speech format 


John said he was tired ID1 or ID2 
He said John was tired 102 





Adverbial clause format 


After he came home John served dinner ID1 or ID2 
John served dinner after he came home 101 or ID2 
He served dinner after John came home 102 


must be to different entities (He recovered before John was Ш). Examples of sentences 
with pronominal reference problems due to number and gender agreement of the 
subjects of their constituent clauses are given in Table 1. 


The first problem which very young children encounter when interpreting 
sentences of the type presented in Table 1 is to ascertain whether the noun and 
pronominal subjects match in gender and number before assuming identity of 
reference. Once this elementary stage has been mastered the problem becomes one 
of avoiding the assumption that matching subject forms are a sufficient criterion for 
matching identities. Children who cannot fully distinguish between the necessity of 
number and gender matching and its sufficiency for the assumption of identical 
reference may reveal patterns of interpretation which differ in two aspects from 
children who can. They will erroneously assign ID1 interpretations to sentences in 
which the pronoun is the main clause subject, and the frequency of choice of the 
ID2 option for ambiguous sentences in which the pronoun is the subject of the 
subordinate clause will be reduced. 


METHOD 

Sample 

There were 25 reading retarded subjects, five at each age level from 8 to 12 
years. These children, who were selected on the basis of records held by the Guidance 
and Special Education Branch of the Queensland Department of Education, each 
met the following criteria: a Binet or WISC IQ of 90 or above, enrolment in a 
remedial reading class, reading test results which indicated confusion, distortion or 
reversals of letters, no diagnosed brain damage or defects of visual or auditory 
acuity, and they came from an English-speaking home. 
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The control subjects were selected from the middle third of standard classrooms, 
and were matched to the retarded group for age and sex. Forty-two of the 50 subjects 
were boys. The two groups did not differ significantly in intelligence. 


The test 

The test consisted of 4 sets of 12 two-clause sentences, one clause with a pro- 
nominal subject matching in number and gender the nominal subject of the remaining 
clause. There was also a set of 12 preliminary items, designed to test identification 
of the characters and understanding of the procedure, understanding of the sentences 
when there was no pronominal reference problem, and understanding when the 
pronoun did not agree with the noun subject in gender or number. Half the sentences 
were of indirect speech format, and half of adverbial clause format. Each sentence 
with a pronoun in the subordinate clause and noun in the main clause was matched 
by a parallel sentence with the subjects reversed, thus providing one set of ambiguous 
and one set of ID2 sentences within each format. 


Procedure 

One or both of the subject options of each sentence was selected from the Disney 
characters, Mickey Mouse and Donald Duck. In some sentences an alternative 
figure, such as a doctor or a policeman, formed one option. Each character was 
represented by a doll, and the appropriate pair of dolls was placed in random order 
in front of the child before each sentence was read to him. This added interest to 
the task for the younger children and served to emphasise that the choice was always 
between two characters even if only one was named in the sentence. 


After each test sentence was read to the child, he was asked to indicate which 
character performed the action designated by the verb of the clause with the pro- 
nominal subject. In those preliminary items which lacked two clauses the main verb 
was used. Where there were two clauses but no pronoun, on one occasion the 
dependent verb and on the other the main verb was used. The children could respond 
verbally or by an action such as pointing. Before the test the children were required 
to identify each character by pointing. Almost all the responses to the test items were 
verbal. 'The items were administered in a predetermined order consisting of pre- 
liminary items, six adverbial format items, 24 indirect speech format items, and 18 
adverbial items. The sentences were randomised within each of the test blocks, with 
the exception that parallel sentences could not be consecutive. The same order of 
presentation was used for all subjects. 


RESULTS 
Group differences 

Only five errors were observed among 600 responses to the preliminary items. 
The groups were equivalent in their understanding of the sentence forms when there 
was no pronominal reference and when there were number and gender cues to the 
reference of the pronoun. 

The performance of the reading retarded subjects differed from that of normal 
readers when the pronominal subject matched the named subject in number and 
gender, F(1) = 6:95, P «0-05. The means presented in Table 2 indicate that at all 
but one age level for each format (nine years for IS, ten for AC) retarded readers 
produced fewer correct (ID2) interpretations. 

The identity preferences for ambiguous items (Table 3) also differed, with re- 
tarded readers choosing fewer ID2 options for both IS and AC formats with the 
exception of the nine-year-olds. 

Reference to individual consistencies is helpful in the interpretation of group 
performances. Raw scores were classified according to whether the child was a 
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TABLE 2 


MEAN Correct (ÍD2) INTERPRETATIONS OF 

UNAMBIGUOUS SENTENCES WITH INDIRECT 

ЗРЕЕСН (IS) AND ADVERBIAL CLAUSE (AC) 
FORMATS 


Retarded readers Normal readers 


8 6:8 74 11-2 10:8 

9 110 10-0 10-0 10:2 

10 10-8 10-8 11.6 10-4 

11 8:0 6:8 10-0 9:8 

12 11-0 9-4 11:6 10:8 

Меап 9:5 88 10:8 10-4 
TABLE 3 


MEAN ID2 INTERPRETATIONS OF AMBIGUOUS 
SENTENCES WITH INDIRECT SPEECH (IS) AND 
ADVERBIAL CLAUSE (AC) FORMATS 


Retarded readers Normal readers 


Age IS AC IS AC 


mixed responder (four to eight ID2 responses), or predominantly an 101 or ID2 
responder (9-- responses per set). For both sets requiring ID2 responses, no normal 
readers were predominantly users of the wrong (ID1) rule. Three reading retarded 
children were ID1 users for the IS set and one for the AC set. For the ambiguous 
sets one normal reader displayed an ID1 preference for the IS set and five for the AC 
set. In both cases more reading retarded children were ID1 users, five for IS and 
eight for AV. Conversely there were fewer ID2 users among the reading retarded, 
eight and none respectively for IS and AC sets, than among the normal readers of 
whom 13 and 5 demonstrated disregard for number and gender matching as a 
sufficient basis for assuming a single identity. 


Set differences 

The differences between sets were highly significant, F(3) = 46:575, Р< 0:01. 
In interpreting sentences with an AC format both retarded and normal readers at 
every age level used more ID2 responses when these were obligatory than when 
merely optional. This was also true of the IS format with the exception that the 
youngest group of retarded readers did not demonstrate the expected trend. Com- 
parison of the obligatory sets in both formats showed that the normal readers gave 
fewer correct responses to the AC than the IS items except at nine years. The same 
pattern was sufficiently strong among the 9-, 11- and 12-year-old retarded readers for 
this trend to be marginally greater for the retarded than the normal readers. For the 


optional sets both groups consistently produced more ID2 interpretations to the IS 


F 
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than to the AC format. Consequently the group x set interaction was not significant, 
F(1, 3) = 0:071, P «0-05. 


Age differences 

Neither the age, F(4) — 0-775, nor the age x groups interaction, F(4, 1) — 1:027, 
proved statistically significant. The normal readers by eight years had mastered the 
obligatory rules almost to the ceiling of the test for both formats. The youngest 
reading retarded group performed poorly on the obligatory rules but the level attained 
by the following year was representative of all older age levels. Among the normal 
group's results for the ambiguous sentences of the AC set, there was a trend toward 
the reduction of ID2 interpretations immediately after eight years which is consistent 
with the final waning of the assumption that a pronoun in the first position requires 
different identities. Although the trend was not sufficient to produce ап age х set 
interaction, F(4, 3) = 1-489, P «0-05, it does suggest that larger interactions might be 
obtained if younger children, who may be more subject to this assumption, were 
tested. 

DISCUSSION 


Retarded readers in the 8 to 12 age range have less knowledge than normal 
readers of the rules of English syntax which govern the assignment of pronominal 
reference. This finding supports the view of Klasen (1972) and others that oral 
language problems are commonly implicated in reading retardation, and Vogel's 
(1975) identification of syntax as a significant factor among oral language problems. 
In adopting a less global approach than that of Vogel, the present research has been 
able to define an area of deficit sufficiently clearly to warrant the construction of a 
specialised remedial programme. Іп extending the debate on the relationship be- 
tween reading retardation and syntactic abilities to relations across clause boundaries 
it also has been useful for establishing the relevance of measures derived from gene- 
rative grammar which have been used in other facets of developmental research, e.g., 
Chomsky (1969), Cromer (1970) and Kessell (1970), but have remained largely 
neglected by those investigating reading retardation. 


The implications of the findings for remedial programmes require consideration. 
The similarity of the patterning of the results across sets for the retarded and control 
group suggests that reading retarded children fail to use a simple strategy for assigning 
pronominal reference which would be appropriate to their reduced level of functioning. 
Re-examination of Chomsky's (1969) results with a sample of young normal children 
reveals that of those who could unerringly apply the 102 restriction when the main 
clause with a pronominal subject preceded the subordinate, 40 per cent always gave 
the ID2 response to ambiguous sentences in which the pronominal subject preceded 
the named subject. When the order of the clauses was reversed only one child 
exercised even a single ID2 option. Thirty per cent of the children who did not always 
apply the ID2 restriction where it was required appeared to have almost mastered 
the restriction rule, but made a single mistake. This group's interpretations of the 
ambiguous sentences were similar to those of the more advanced children. If the 
pronomina] subject preceded the named subject they were no more likely to give an 
ID1 response than they were with the non-ambiguous sentences. More ID1 responses 
were given to the ambiguous sentences in which the named subject appeared first. 


The performance of all these children is consistent with an order principle which 
permits ID] interpretations only if the named subject precedes the pronominal 
subject. Without reference to the main versus subordinate clause distinction this rule 
avoids the generation of interpretations which are inadmissible in English syntax, 
and promotes ID2 interpretations for ambiguous sentences with first subject pronouns. 
Of the children in Chomsky's sample who were over seven years and who knew the 
rule for the non-ambiguous sentences, none followed the order rule strictly. Between 


B. W. J. DALGLEISH and SUSAN ENKELMANN 295 


85 and 102 months, however, only 1 of 14 subjects gave fewer 102 responses to 
ambiguous sentences with second subject pronouns when compared with first subject 
pronouns. 

Thus prior to seven years, many children who seem to have mastered these 
referential aspects of pronominal syntax base their interpretations on an order rule 
which also continues to influence the judgments of seven- and eight-year-olds about 
ambiguous sentences. The rule represents a useful strategy for children who cannot 
yet operate with a rule based on the more abstract main and subordinate clause 
distinction. Although their mastery of pronominal reference rules was less than that 
of normal readers, the retarded group gave no evidence of following the order 
principle, but produced a pattern of results like those of normal readers who were 
past the order principle stage. They did not produce more [D2 interpretations to the 
ambiguous AC set which included first subject pronouns, either by comparison with 
their own performance on the IS set or by comparison with the normal readers. 

Research in cognitive development has shown many adult rules are acquired in an 
evolutionary manner whereby the mature system is preceded by child-like rules. 
Children’s early rules may serve a useful adaptive function, despite lacking the power 
of their adult successors. The stress laid by Piagetians on the child’s passage through 
primitive rule systems of his own devising before proceeding to more advanced levels 
has been an important motivator of the protracted empirical and theoretical examina- 
tions of the trainability of class inclusion, conservation and transitivity. Brainerd’s 
(1978) recent critical review of the area stated Piaget’s view to be that incorporation 
of the spontaneous laws of development is a necessary pre-condition for the success 
of training. 

Accordingly, researchers interested in devising new approaches to remediation 
for children who misconstrue pronominal reference may wish to consider using the 
order rule as a transitional stage in teaching the adult reference system. Although 
short term memory deficits have been implicated in reading retardation, e.g., Corkin 
(1974), McKeever and Van Deventer (1975), these are more likely to cause the child 
to neglect the possibilities of order strategies than to prove an absolute barrier to their 
acquisition. Corkin reported a mean digit span of 4:3 for eight- and nine-year-olds 
and 6:5 for 10- and 11-year-olds. If the first subject can be held separately pending 
the appearance of the second, the problem is well within the storage capacity of the 
children even allowing for the decrement in performance due to temporal delay which 
Corkin reported. Even if the whole of the first clause must be stored on a word 
element basis rather than more economical grammatical unit basis most age appro- 
priate sentences could be coped with. 

However, there are two aspects of the proposal to teach the order rule which 
require examination. An association between syntactic deficits and reading retarda- 
tion has been shown, but it remains an assumption that these deficits cause reading 
problems. Thus initially such a programme is best regarded as addressing oral 
language deficits. Nevertheless there are strong indications that syntactic deficits 
may be included among the causes of reading retardation. It is not difficult to 
appreciate the disruptive effect on the comprehension of passages which the conflict 
in information from a misconstrued pronominal sentence and the surrounding 
sentences of the passage might create. Several independent sources of research and 
theory have lead investigators to assign to syntax a central role in reading. 


Perfetti and Goldman (1976) concluded that their studies of discourse memory 
indicate reading comprehension is best understood as dependent upon general 
language comprehension skill. Wedell (1977), following Goodman's (1967) earlier 
characterisation of reading as a psycholinguistic guessing game, proposed that 
children who can decode even a few words of a sentence may be able to fill in the 
remainder on the basis of syntactic rules, and that a similar function may exist for 
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the reconstruction of partly decoded words. Regardless of the role of syntax in 
normal reading, the reading retarded child who is also deficient in knowledge of 
syntax is certainly deprived of strategies which could be usefully employed in 
reconstructing materials whose initial decoding has been fragmentary. 


It must also be emphasised that the order rule should be viewed as a transitional 
strategy for the achievement of a mature rule system. The advantages of the order 
rule are that the child is alerted to the need to regard number and gender agreement as 
necessary but not sufficient conditions for identity of reference, and is provided with 
a simple method of meeting the prescriptive requirements of English grammar. In 
its latter role the order rule has a disadvantage which requires recognition. For 
sentences following the IS format its use is unexceptionable since the main + sub- 
ordinate sequence is normally fixed. Sentences of the AC format are not limited in 
this way, and therefore would be needlessly subject to the ID2 restriction when 
dependent clause with a pronominal subject preceded the main clause. A child who 
was taught the order rule would advance making ungrammatical interpretations to 
making only grammatical interpretations. However, he would be operating with a 
rule which was not sufficiently powerful to allow him access to all the possible gram- 
matical interpretations of some pronominal reference sentences. For children who 
are thought to be capable of operationalising the main:subordinate distinction it may 
be beneficial to restrict the training sentences for the order rule stage to the main + 
subordinate sequence, using the alternative sequence as a later test of operationalisa- 
tion of the main:subordinate distinction. If there are children who are not capable 
of reaching the mature rule, it may be necessary to teach those sentences which can 
occur in the subordinate -4- main sequence as a series of exceptions to the order rule. 
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THE EFFECT OF EXTRAVERSION AND DETAIL 
CONTENT ON THE RECALL OF PROSE BY 
ELEVEN-YEAR-OLD CHILDREN 


By R. J. RIDING 
(Department of Educational Psychology, University of Birmingham) 


Summary. Judges allocated the 30 details in a prose passage to the six content categories 
of abstraction, action, time interval, quantity, appearance and direction of movement. 
Groups of extravert, ambivert and introvert 11-year-old children listened to the 230- 
word passage and were tested for recall after one hour. Half of each extraversion 
division were given a free-recall test while the rest answered questions, The principal 
finding was of a significant interaction between extraversion and detail content in their 
effect on recall, Abstractions were best recalled by extraverts, time intervals and 
quantities by ambiverts, and directions by introverts. There was little difference between 
the groups in their recall of action and appearance details. These results were considered 
in terms of a possible relationship between extraversion and the mode in which informa- 
tion is represented in memory. A secondary finding was of a significant interaction 
between extraversion and recall test type in their effect on recall. 


INTRODUCTION 


WHAT is learned by listening to a prose passage is likely to be affected by the manner 
in which the information in it is represented in memory. Paivio (1971, 1975) has 
proposed a dual-coding view of learning. He suggested that information may be 
represented in memory in either an imaginal or a verbal mode. Pellegrino et al. (1977) 
have pointed out that whether such dual coding is accomplished through separate 
memory systems or simply by means of a single multi-modal memory is not clear at 
present. Further, Kosslyn and Pomerantz (1977) noted that it is also not possible to 
say with certainty whether images are an actual basic form of representing information 
in memory, or are secondary and derived from propositional representations. 
However, after reviewing the experimental evidence they concluded that both images 
and verbal representations have a functional role in learning. The results of several 
studies have shown that there are individual differences in imagery experience and 
that these are related to learning performance (e.g., Hollenberg, 1970; Ernest and 
Paivio, 1971; Marks, 1973; Riding and Taylor, 1976). 


Evidence from four studies suggests that performance in the different modes of 
representation may be influenced by extraversion, or more strictly, by the level of 
cortical arousal which extraversion is thought to reflect (H. Eysenck, 1967, pages 
230-231). These studies are: (1) the ease with which images can be evoked, (2) the 
type of errors made in learning verbal materials, (3) the effect of the order of presenta- 
tion on the learning from visual and verbal material and (4) the type of prose detail 
content most readily recalled. 


Huckabee (1974) asked groups of extravert and introvert students to rate the 
ease with which they thought abstract and concrete nouns evoked images. He found 
that the introverts had higher imagery scores than the extraverts, particularly on 
concrete nouns. However, Gale et al. (1972) found a contrary result using the Betts 
Imagery Vividness Questionnaire. Since, as Di Vesta et al. (1971) found, the intro- 
spective report of imagery tends to be saturated by social desirability which is likely 
to induce extraverts to give high ratings, this approach may not be the most useful. 


In considering word list learning, Schwartz (1975) suggested that arousal causes 
memory to be orientated toward the visual and acoustic properties of verbal material 
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and away from its semantic and syntactic aspects. He found that in learning lists of 
words, high arousal subjects (neurotic introverts) made more errors when the words 
were phonetically similar (e.g., billow, pillow, etc.) than when they were semantically 
similar (e.g., spoon, fork, etc.). For low arousal subjects (stable extraverts) the results 
were reversed. Schwartz concluded that the high arousal subjects concentrated on the 
physical similarities of the words and so found the phonetically related ones more easy 
to confuse than the semantic ones, while the low arousal subjects focused on the 
semantic similarity and so found words from the same conceptual category confusing. 


Iu the third type of study, Riding and Wicks (1978) investigated learning by 
eight-year-old children from colour-slide sequences presented with a tape-recorded 
commentary about each slide. Groups of extraverts, ambiverts and introverts 
received either each slide followed by its associated commentary or the reverse order 
of commentary-slide. A significant interaction between extraversion and presentation 
order was found in their effect on immediate recall score. Extraverts recalled most 
when the commentary preceded the slide to which it referred, ambiverts performed 
similarly in both conditions while introverts did best when the slide was presented 
before its commentary. À possible interpretation of these results is that extraverts 
are verbally orientated and hence do best when the commentary comes before the 
slide, while introverts are inclined towards a visual representation of information and 
so prefer the slide to come first so that they can add to an image of it from the 
commentary. When the commentary comes first introverts are likely to generate 
their own imagery from it and this will not only require processing time but will 
probably interfere with the form of the picture seen on the slide. 


Finally, Riding and Parker (1979) read prose passages to groups of extravert, 
ambivert and introvert 11-year-old children. Басһ passage contained both main 
details and elaborations (rated subjectively). While an interaction was found between 
extraversion and detail importance in their effect on free recall one hour after hearing 
a passage, an inspection of the recall of individual details suggested that there was 
probably a much more prominent interaction between extraversion and detail content 
type. Relative to the overall performance on each type of content, extraverts were 
good on simile and general verbal information, ambiverts did best on numerical and 
quantity details, while introverts performed well on details that described appearance 
and position. 


Taken together, the results of these studies suggest a relationship between 
extraversion and mode of representation, although the evidence is not very strong 
nor the role of the representations clear. The aim of the present study was to clarify 
the observation of Riding and Parker concerning the effect of detail content type by 
using a prose passage with more clearly defined and extended categories of detail 
content. Variation in recall with content could reflect either what is learned and its 
mode of representation, or simply a preference to recall details from some categories 
and not others. These possibilities will be checked by using both free recall, which 
may reflect preference, and questioned recall, which should cue retrieval and 
indicate whether the detail is in fact available for recall. 


METHOD 

Sample 

All 150 11-year-old children from two urban primary schools were given the 
Junior Eysenck Personality Inventory (S. Eysenck, 1965) and divided into extravert, 
ambivert and introvert categories. Subjects were randomly dropped to give the 
planned sample of 72 children with 24 subjects (12 boys and 12 girls) in each division. 
The extraversion divisions were: extraverts, range 20-23, mean 22-00; ambiverts, 
range 16-19, mean 17:33; introverts, range 5-15, mean 13-04. 
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Materials 

The learning material was a 230-word prose passage which described an American 
moon trip. The passage was divided into 30 details with a recall test question for 
each. Fourteen judges (seven men and seven women, all teachers attending a 
university course) were given the details and questions together with the list of the 
detail category titles shown below (but not with the mode of representation) and 
asked to individually allocate the details to the six categories on the basis of question 
content. The category of each detail was taken as.that selected by the majority of 
the judges. The number giving the category is given in the Appendix. 


A pair of categories was considered to be suitable for representation in each of 
three basic modes. The categories with the number of details in each, together with 
the mode in which they might be most readily represented, were as follows: 


Detail category Appropriate mode of representation 
(1) Abstraction and simile (seven) 
(2) Action (six) Verbal 


(3) Time interval (six) ; ; 

(4) Quantity and number (three) Numerical (mathematical) 
(5) Appearance (four) 4 ; 

(6) Direction of movement (four) Imaginal (spatial) 


The recall performance was measured either by free recall or by questions. The 
question recall test comprised one question for each of the 30 details. Both the free 
and the question recall tests were scored one point for each detail correctly recalled. 
Since a clear brief description of categories is difficult, the passage and an indication 
of the categories of the details, together with the recall test questions, are given in the 
Appendix. | 


Procedure 

The subjects in each extraversion division were randomly divided into two groups 
of 12 in each (six boys and six girls). All the subjects listened to the prose passage and 
then one hour later one group from each division free recalled the details while the 
other group was read the recall test questions in the same order in which the details 
occurred in the passage and wrote down their answers. Presentation and testing was 
done in groups. A one-hour interval between presentation and recall was used to 
ensure that recall was from long-term memory. The hour was occupied by normal 
school activities. 


RESULTS AND DISCUSSION 


A four-way analysis of variance with repeated measures on detail content was 
performed and this is summarised in Table 1. This shows that while extraversion 
did not have a significant main effect on recall, the interaction between extraversion 
and detail content was highly significant. There was also a significant interaction 
between extraversion and recall test type. Before considering these results in detail, 
the other findings will be briefly noted. The main effect of content type was significant 
and may be due either to some categories being easier to learn or easier to recall. Sex 
did not have a consistent effect either by itself or in interaction with other variables. 


In order to illustrate the interaction between extraversion and detail content 
type, the overall recall performance for each category is given in Table 2. Performance 
on the content categories varied with extraversion in a manner that was generally in 
keeping with the expectations. Inspection of Table 2 indicates that relative to the 
performance of the other extraversion divisions, extraverts did best on abstractions. 
For action details the difference between the groups was fairly small. Ambiverts 
were superior on time intervals and quantity. Appearance performance was similar 
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TABLE 1 


SUMMARY OF ANALYSIS OF VARIANCE 





Source df MS F P« 
Between Subjects 
Extraversion (A) 2 2988-20 2-60 
Sex (B 1 1134 «1 
Recall test type (C) 1 20364-00 1770 | 0001 
AB 2 590-91 <1 
AC 2 4091-90 356 005 
BC 1 726 «1 
ABC 2 1385-20 1-20 
Subjects within groups 60 1150-80 


Within Subjects 
Detall content type (D) 21842.00 52:94 0-001 


5 
10 1370-10 3:32 0:001 
5 


BD 493-50 1:20 
CD 5 551:39 1:34 
ABD 10 32029 | «1 
ACD 10 646:60 1:57 
BCD 5 38517  Á <1 
ABCD 10 309.27 | «1 


D x Subjects within groups 300 412-57 


TABLE 2 


RECALL PERFORMANCE BY EXTRAVERSION GROUPS OF DETAIL CONTENT TYPES 





Mean percentage of details in each 





Detail category recalled for extraversion group 
content о 
type Extraverts Ambiverts Introverts 
Abstraction 67.2 64:8 55-4 
Action 57-6 61-2 52:2 
Time interval 72:9 76:4 52:8 
Quantity 68-0 861 66-7 
Appearance 54:2 56:3 49-0 
Direction 21-9 17-7 323 


for all groups, and introverts were highest on directional details. Planned comparison 

tests showed that abstraction, action and appearance details interacted significantly 

with the categories of time interval and quantity (minimum P «0:05). Time interval 

and quantity interacted with one another significantly (P « 0-01), and the interactions 

pin pen details and each of the other categories was highly significant 
« 0-001). 


In considering the reasons for the interactions, the first point to be noted is that 
there was no significant interaction between detail type and recall test type. "This 
suggests that the variation with extraversion is not merely a matter of preference for 
recalling certain types of detail, but rather that some types are more available for 
recall than others. 


A possible explanation for the findings is that they reflect differences in the way 
in which information is represented in memory as introversion, or arousal, increases. 
A tentative model might be very briefly as follows. "There are at least three basic 
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forms of representing information in memory and these are the verbal, numerical (or 
mathematical) and imaginal (or spatial) modes. The efficiency of the modes varies 
with arousal so that at low arousal (extraverts) the verbal mode is superior, at 
moderate arousal (ambiverts) the numerical mode performance is best and at high 
arousal (introverts) the imaginal mode is most efficient. Consequently the order of 
performance in the three modes by the three extraversion groups is: extraverts, best 
in the verbal, next in the numerical and worst in the imaginal mode; ambiverts, 
superior in the numerical and an equal second best in the verbal and imaginal modes; 
introverts, best in the imaginal, next in the numerical and worst in the verbal mode. 
Whenever possible information will be represented by a learner in the mode in which 
he is most efficient. Material may often be represented with roughly equal efficiency 
in more than one mode. 


TABLE 3 


RECALL PERFORMANCE FOR RECALL TEST TYPES 





Mean percentage of details recalled 





Recall for extraversion group 
test УЭмЪдФ M——— 
type Extraverts Ambiverts Introverts 
Free 43:9 56:9 47.3 
Questions 70-0 63-9 554 


Applying the model to two of the categories, by way of illustration, the time 
interval details were best recalled by the ambiverts because these were appropriate for 
coding in numerical form. The extraverts were next in performance because the 
details were also amenable to verbal coding while the introverts found the imaginal 
code unsuitable and had to use their second best mode of numerical coding. With 
the abstraction details the extraverts did best because they used the verbal code which 
was most suitable. The ambiverts were next in performance by using the verbal code 
which was their second best, and the introverts were worst because the verbal mode 
was their least efficient and abstractions are often difficult to code as images. 


Turning now to the secondary finding of the present study which was of a 
significant interaction (P «0-05) between extraversion and recall test type in their 
effect of recall. The form of this can be seen in Table 3. Free recall by extraverts 
and introverts was fairly similar while ambiverts were slightly superior. On questioned 
recall all groups did better, but the increase was greatest for extraverts. The reason 
for this interaction is not clear, although several possibilities suggest themselves. 
One is that extraverts are more sensitive to the effect of cues because of the way in 
which they organise information in memory. Another is that they respond more 
enthusiastically to the social situation of being continually questioned as opposed to 
being asked to free recall and then left to the task of writing out their account. 
Thirdly, recall performance may follow a typical inverted ‘ U ’ function as introversion 
(arousal) increases. If questioned recall is more arousing than free recall then this 
change in arousal coupled with the elevation of the performance curve due to the 
cueing effect of the questions could result in a shift in the position on the curve and 
change the order of performance. In the present study the recall questions were read 
to the subjects. A study employing a printed recall test would help to clarify some of 
these possibilities. 


Generally, the present results are consistent with the notion that there are 
different modes of representing information in memory, and that the mode employed 
varies with the degree of extraversion of the learner. This may explain why there is 
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often no overall effect of extraversion observed in studies of prose learning (e.g., 
Jones, 1976). Further studies are needed to determine more precisely the nature and 
number of the modes of representation and the manner in which performance in them 
varies with arousal, 
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APPENDIX 


PROSE PASSAGE DETAILS AND RECALL TEST QUESTIONS 


Detail type most 
frequently given by 
judges with frequency 


Details in order of presentation Questions (maximum 14 
(1) The story is about a giant sized rocket. What size was the rocket? App 9 
(2) The rocket gives the power. What gave power for the spaceship? Abst 9 
(3) On top like a big tennis ball is a spaceship. What was like a big tennis ball? Abst 12 
(4) The spacemen travel in a round capsule. What did the capsule look like? App 12 
(5) They wear silver-coloured spacesuits. What colour were the spacesuits ? App 13 
(6) In the year 1969, In which year did the astronauts set out Tim 12 
` for the Moon? 
(7) three astronauts How many astronauts were there? Quan 14 
(8) climbed into the lift. What did the astronauts do first? Act 10 
(9) It went up 350 feet. How high did the lift go up? Quan 7 
(10) This is the same height as St. Paul’s What is the height the lift goes up compared Abst 12 


Cathedral, with? 


(11) The countdown lasted for two hours 
(12) Up into the air the rocket soared. 


(13) After five minutes the first stage of the 
rocket fell away. 
(14) Into outer space they went, 


(15) until the Earth looked as tiny as a golf-ball. 


(16) The spacemen took photographs of the 
Moon’s surface. 

(17) Soon it was time to go down to the Moon 
in the moonbug. 

(18) They touched down where they could see 
craters. 

(19) Walking on the Moon was easy. 

(20) They breathed oxygen from their cylinders, 

(21) They collected rocks to take home for 
scientists. 

(22) They planted their stars and stripes flag. 

(23) The Moon looked a friendlier place. 

(24) Eventually they returned to their three- 
legged spaceship. 

(25) After they had climbed up the ladder they 
closed the hatch. 

(26) About twenty hours after arriving on the 
Moon it was homeward bound for them. 

(27) The journey had lasted eight days. 

(28) They were kept by themselves for a further 
eighteen days to check for illness. 

(29) Their families were pleased to see them. 

(30) The journey into space had made them 
famous. 
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How long did the countdown last for? 

What happened to the rocket after the 
countdown? 

How long was it before the first stage of the 
rocket fell away? 

Where did they go after the first stage of the 
rocket fell away. 

What looked as tiny as a golf-ball? 

s did the spacemen take photographs 


Where did they go in their moonbug? 


What part of the Moon did they touch 
down on? 

Was it easy or hard to walk on the Moon? 

What did they breathe from their cylinders? 

What did they collect to take home? 


What sort of flag did they plant? 
What did the Moon look like? 
How many legs had the spaceship? 


How did they get back into their spaceship? 


How many hours did they spend on the 
Moon? 

How long had the journey lasted ? 

How long were they kept by themselves ? 


Who were pleased to see them? 
What had the journey into space made 
them? 
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Tim 13 
Dir 12 


Tim 13 
Dir 11 


Abst 10 
Act 10 


Dir 12 
Act 7 
Act 9 
Act 12 
Act 12 
App 8 
Abst 8 
Quan 11 
Dir 10 
Tim 11 


Time 14 
Time 13 


Abst 8 
Abst 9 
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THE CONSERVATION TASK AS AN INTERACTIONAL 
SETTING 


By P. H. LIGHT, М. BUCKINGHAM Амр A. H. ROBBINS 
(Department of Psychology, University of Southampton) 


SUMMARY. Two experiments are reported, the first being a partial replication of 
McGarrigle and Donaldson (1975). Sixty children, average age 5 years 6 months, were 
tested on conservation of length both in a standard condition where the transformation 
was clearly a deliberate act of the tester and in an ‘ accidental’ condition, where the 
transformation was occasioned by an errant teddy bear. Overall frequency of conserving 
judgments was lower than that obtained by McGarrigle and Donaldson, but their 
findings of a higher success rate in the accidental condition was confirmed. The second 
experiment represents a different approach to the same issue. Eighty children, average 
age 6 years 0 months, were tested on conservation of discontinuous quantity in pairs. 
Half of the children were given a standard presentation, while for the other half the 
transformation was made to seem incidental to a competitive game. Conservation 
rates were 5 and 70 per cent respectively. The implications of these results for our 
understanding of the transition to operational thinking are discussed. 


INTRODUCTION 


ONE of the unfortunate consequences of the demarcation which has grown up between 
the study of social and of cognitive aspects of development is that cognitive testing 
situations are rarely considered from an interactional point of view. Conservation 
measures, devised by Piaget (e.g., 1952) and adapted by innumerable investigators 
since, are a case in point. While some attention has been paid to strictly linguistic 
aspects of the conservation task (e.g., Griffiths et al., 1967), the interchange between 
tester and child has rarely been critically examined as a social encounter. 


Doise (1978) discusses the need for a social psychology of cognitive development, 
but his own studies (e.g., Doise et al., 1975) have considered interactions between 
subjects when groups of children are tested together rather than the tester-child 
interaction in the more usual individual test setting. Rose and Blank (1974) investi- 
gated the possibility that the request for two judgments in the standard conservation 
task (one before and one after transformation of the materials) might be an important 
element in the interactional context. They hypothesised that the request for the 
second judgment might be taken by the child as indicating that he should now alter 
his first judgment in recognition of the change introduced by the tester in transforming 
the array. Children’s performance was compared on a standard conservation task 
and on a modified version in which the initial request for a judgment was omitted. 
Children made fewer errors on the modified task, as predicted. This result seems to 
confirm that children are sensitive to the extra-linguistic cues provided within the task 
setting. 


McGarrigle and Donaldson (1975) report an ingenious experiment which 
provides further evidence on this point. They suggest that the tester’s action in 
transforming the array may lead the child to infer an intention on the tester’s part to 
talk about what he has just been doing. Іп a conservation of number task, for 
example, the fact that the tester transforms one of the rows in terms of length might 
suggest to the child that the following judgment bears on the question of length. So 
instead of attending to precisely what the question means, he may answer instead in 
terms of what he thinks the tester means by the question. McGarrigle and Donaldson 
refer to the distinction between utterance meaning and speaker's meaning drawn by 
Grice (1968), going on to suggest that in developmental terms an understanding of 
the latter may precede an understanding of the former. 
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In order to investigate the importance of the intentional structure of the speaker’s 
non-linguistic behaviour for the child’s interpretation of utterances, they again 
compared the outcomes of standard and modified conservation tasks. In the modified 
version, the transformation of materials was achieved in a way designed to appear 
accidental to the child, rather than as an obviously deliberate act of the tester. This 
was achieved by the device of a ‘ naughty’ teddy bear, manipulated quite openly by 
the tester. The teddy bear was introduced to the child before the task and described 
as being intent upon spoiling the game. After the first judgment in the conservation 
task the teddy ‘ escaped ', rushed around, and achieved the requisite transformation 
of the materials ‘ accidentally’. Eighty children aged between 4 and 6 years were 
tested on standard and modified versions of conservation of length and number tasks. 
Fifty children conserved both equality and inequality of length and of number when 
the transformation was accidental, whereas only 13 met this criterion in the standard 
condition. 


The present study includes a partial replication of McGarrigle and Donaldson's 
experiment (Experiment 1), and a further investigation of the same problem using a 
rather different procedure (Experiment 2). The two experiments will be described in 
separate sections, followed by a general discussion. 


EXPERIMENT 1 

Procedure 

Sixty children from one Southampton first school participated, the sexes being 
equally represented and the age range being 4 years 9 months to 6 years 5 months 
(mean 5 years 6 months). The children were tested on conservation of equality and 
of inequality of length. Each of the two tests was given in two conditions, one where 
the transformation was deliberate (the standard condition) and the other where it was 
consequent upon the escape of the teddy bear (the accidental condition). Materials 
and procedures for the two conditions were modelled exactly on those reported by 
McGarrigle and Donaldson (1975). 


Children were randomly assigned to one of two groups. One group were given 
both tests in the accidental condition before being tested on the standard condition 
while for the other group the order was reversed. Within groups half the children 
were given the conservation of equality tests first and half the conservation of 
inequality tests first, the order of presentation being the same in both conditions. 


Results 

Following McGarrigle and Donaldson, children were judged as conserving or 
non-conserving on the basis of their post-transformation judgments, justifications not 
being required. In the standard condition of presentation 11 of the 60 children 
conserved inequality of length and 10 conserved equality of length. In the accidental 
condition 21 conserved inequality and 17 conserved equality. 


Table 1 shows how conservation judgments were related across the two presenta- 
tion conditions for each of the tasks. To test the significance of the differences in 
response in the two conditions the McNemar test for the significance of changes 
(Siegel, 1956) was used. For conservation of inequality the difference between 
conditions was significant (y2 = 8-07, P« 0-01). For conservation of equality the 
difference was less marked (x2 = 3:3), being significant (P « 0-05) only if a one-tailed 
test is judged appropriate. 


Table 2 shows the data for the two tasks combined. Children may be classified 
as conservers if both conservation judgments were made correctly (the strict criterion) 
or if either one of them was made correctly (the lax criterion). Using the strict 
criterion the changes between the two conditions of presentation do not reach 
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TABLE 1 


THE RELATIONSHIP OF CONSERVING (C) AND NON- 

CONSERVING (NC) JUDGMENTS UNDER THE Two 

CONDITIONS OF PRESENTATION FOR EACH OF THE 
Two Tasks 


Accidental condition 





Inequality Equality 
Standard ---- ----- 
condition C NC C NC 
NC 10 39 9 41 
C 11 0 8 2 
TABLE 2 


THE RELATIONSHIP OF CONSERVING (C) AND NON- 

CONSERVING (NC) JUDGMENTS UNDER THE Two 

CONDITIONS OF PRESENTATION USING STRICT AND 
Lax CRITERIA OF CONSERVATION 


Accidental condition 


Strict criterion Lax criterion 





Standard ------- — 
condition C NC C NC 
NC 8 44 10 37 
C 6 2 13 0 


statistical significance (McNemar test, y2 = 2-5), but using the lax criterion the changes 
are clearly significant (x2 = 8-1, P<0-01). 
There was no significant effect of order of presentation of the two conditions. 


Discussion 

Levels of correct responding were very much lower in the present study than in 
McGarrigle and Donaldson’s original experiment. While they found a mean of 
approximately 70 per cent correct judgments in the accidental condition and 30 per 
cent in the standard condition, our results are approximately 30 per cent in the acci- 
dental condition and 17-5 per cent in the standard condition. The difference may well 
reflect differences between the samples, despite the comparability of ages. McGarrigle 
and Donaldson give no details of the socio-economic backgrounds of the children 
tested, but the school from which the present sample was drawn served a mainly 
working class area, 

The low level of conserving judgments inevitably limits the extent of the differences 
between the two conditions of presentation. However, differences in the expected 
direction were apparent, and were satisfactorily statistically significant for conservation 
of inequality of length and for combined scores when a criterion of one correct 
judgment was applied. Donaldson (1978, p. 64) mentions in a footnote an unpublished 
study by Dockrell replicating the findings of McGarrigle and Donaldson (1975) but 
finding less marked effect. This is consistent with the present findings. 


EXPERIMENT 2 
Rationale 

Since the child’s interpretation of the tester’s intentions is central to our concerns, 
it is unfortunate that the ‘ naughty teddy’ device involves some ambiguities in this 
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respect. While the children in the first experiment were willing to ‘ play the game’ 
by attributing agency to the teddy bear, they clearly also knew that the tester was 
responsible for both introducing and manipulating it. The term ‘ accidental’ is 
perhaps a misnomer because the teddy bear was supposedly trying to spoil the game. 
But the extent to which the child holds separate the intentions of the tester and those 
of the teddy must remain in doubt. As any parent knows, children at this age have 
an unnerving tendency to ‘ step outside ’ role-playing situations of this kind just when 
the adult has been drawn in most deeply! 


In Experiment 2 the basic requirements of the conservation task are preserved, 
but they are placed in the context of a competitive game between two children. The 
test of conservation is no longer the focus of the tester/child interaction, but is made 
incidental to the proceedings involved in setting up the game. Given the account 
offered by McGarrigle and Donaldson, it is hypothesised that conserving responses 
should be more frequent in this ‘ incidental’ condition than in a standard test of 
conservation. 


Sample and materials 

Eighty children (40 boys and 40 girls) with a mean age of 6 years 0 months, 
range 5 years 7 months to 6 years 7 months, participated in the experiment. The 
sample was drawn from three Southampton first schools which drew children from 
a wide range of socio-economic backgrounds. 


The experiment was conducted in a quiet room containing a table and three 
chairs. Materials consisted of three small Pyrex beakers (250 ml) one of which was 
inconspicuously chipped along the rim, one large beaker (500 ml), 60 pasta ‘ shells’ 
and two grids, each cell of which was large enough to take one of the pasta shells. 


Procedure 

The conservation task used was based upon Piaget’s (1952) test of conservation of 
discontinuous quantity. Subjects were divided into two groups, taking care to match 
the groups for age, sex, school, ею. A ' between-subjects’ design was used, since 
although significant order effects were not found in Experiment 1, they have been 
reported in other studies (Rose and Blank, 1974; McGarrigle and Donaldson, 1975). 
Within each group the children were randomly allocated into 20 same-sex pairs, the 
conservation task being given to pairs of subjects in both conditions. 


In the standard condition of presentation each pair of children was presented 
with two small beakers and a pile of pasta shells, and the children were told that they 
were going to be asked some questions about them. The tester then put shells into 
the two beakers in handfuls and poured them to and fro as necessary until the children 
both agreed that the two amounts were equal. The large beaker was then presented 
and the tester poured the contents of one of the small beakers into it. A judgment 
was then obtained from each child in turn as to whether the two amounts were the 
same. The order of questioning of the children was determined by age: if in one 
pair the younger child was asked for judgments first, then in the following pair the 
older child would be asked first, and so on. 


In the ‘incidental’ condition each pair was initially presented with the same 
materials except that one of the small beakers was the chipped one. In addition each 
of the children was given one of the grids. They were told that they were going to 
play a game involving putting the shells into the ceils of the grid, the first to use up 
all his shells winning the game. It was stressed that for the game to be fair the children 
should each begin with the same amount of shells. The shells were then placed in 
beakers and initial judgments of equality obtained as in the standard condition. The 
tester, announcing that the game could now begin, handed the intact beaker to the 
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second subject in order of questioning and the chipped beaker to the other. At this 
point the tester ‘noticed’ the chipped edge, which was very sharp and obviously 
potentially dangerous. Expressing surprise and alarm at this, the tester searched for 
an alternative beaker which would be safe to use and produced the larger one. 
Reiterating the importance of equality for the game to be fair the tester elicited 
judgments as to the equality or inequality of amounts as in the standard condition. 
The children then went on to play the game. 


Although it was obviously necessary to make the interaction in the incidental 
appear as natural and unscripted as possible, in fact the experimenter’s wording was 
closely scripted, and this applied especially to the requests for judgments in the two 
conditions. Inevitably there was a longer interval between judgments in the incidental 
condition than in the standard condition, but McGarrigle and Donaldson (1975) have 
studied the influence of this factor and found it negligible. 


Results 
Children were classified as conserving or non-conserving on the basis of their 
post-transformation judgments. Table 3 gives the results for all 80 subjects and also 


TABLE 3 
NUMBERS OF CONSERVING AND NON-CONSERVING JUDG- 
MENTS IN EACH CONDITION, GIVEN BOTH FOR THE TOTAL 


SAMPLE AND FOR THE FIRST CHILD IN EACH PAIR IN ORDER 
OF QUESTIONING 


Total sample First questioned 
C NC C NC 


Standard condition 1 39 1 19 
Incidental condition 26 14 14 6 











the results for the first child questioned in each pair, which should be free from any 
influence of one child's judgments upon the other's within the pairs. 


Conserving judgments were very much more frequent in the incidental condition. 
Taking the data for the first questioned of each pair, only 5 per cent conserved in the 
standard condition whereas 70 per cent conserved in the incidental condition. 
Differences were statistically significant both for the total sample (y2 = 32, Р < 0-001) 
and for the first questioned of each pair (х2 = 18, P « 0-001). 


There were no significant differences associated with sex of subjects or school, 
and children asked second within each pair were no more or less likely to conserve 
than children asked first. There was some evidence of social influence within pairs. 
In the standard condition only one individual gave a conserving response. In the 
incidental condition there were 14 cases where a conserving judgment was given by the 
first child and in 11 of these cases the second child also conserved. There were six 
cases where a non-conserving judgment was given first, and in five of these the second 
child also failed to conserve (Fisher test, P « 0-025). 


Discussion 

The difference in outcome between the two conditions is quite as dramatic as that 
reported by McGarrigle and Donaldson (1975). The much higher levels of conserva- 
tion judgments in Experiment 2 than in Experiment 1 may in part be due to the 
slightly greater average age and wider socio-economic range of the second sample. 


There are indications in the social psychological literature (e.g., Sherif and 
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Sherif, 1969) that judgments offered in a group situation will be influenced by one 
another. A few conservation studies have involved testing two or more children 
simultaneously (Miller and Brownell, 1975; Doise et al., 1975), and these have shown 
interesting patterns of social influence. Present results may best be interpreted simply 
as a tendency on the part of the second child to conform to the first child’s judgment. 
Since pairs of children were tested in both the standard and the incidental condition, 
such conformity cannot account for the differences found between the conditions. 
It can and does inflate these differences, however, so that a conservative interpretation 
of the results must be based only upon the judgments of the first child in each pair. 


The introduction of the ‘ game’ in the incidental condition served the important 
purpose of providing a meaningful context for the conservation judgments. One of 
its drawbacks was that since it depended upon the children initially having equal 
amounts of the materials, the experiment could not easily be extended to include a 
test of conservation of inequality. Rose (1973) has pointed out the advantages of 
incorporating both these tasks when relying upon judgments as a basis for assessing 
conservation. However, in so far as we have classified children as conservers or 
non-conservers on the basis of a single judgment, we have done so merely for con- 
venience of exposition. Our concern is solely with the factors influencing that 
judgment itself—the question of whether the children are ‘ really ° conservers or not 
will be taken up in the more general discussion which follows. 


GENERAL DISCUSSION 


The results of both the experiments reported support the view that successful 
conservation judgments are more likely to be made by young children when the 
transformation of materials in the task is made to seem like an accident, or to seem 
incidental to the main purpose of the interaction. The first experiment replicated 
McGarrigle and Donaldson’s (1975) study, though only conservation of length was 
tested. Overall levels of conservation were much lower than those found in the 
original study, but the predicted difference between the two conditions was still 
apparent. The second experiment involved a quite different situation and did not 
depend upon the child's willingness to attribute independent agency to an inanimate 
object. Instead it capitalised upon young children's understanding of ‘ fairness’ in 
the context of a competitive game. When the transformation was rendered incidental 
to the proceedings, a clear majority of the children made correct conservation 
judgments, whereas only a single child did so in the standard condition of presentation. 


In both experiments care was taken to ensure that the perceptual aspects of the 
arrays before and after transformation were similar in the two conditions, so that the 
results cannot readily be interpreted in terms of misleading perceptual cues. Likewise 
the verbal formulation of the requests for judgment were held constant in the two 
conditions. The children were being asked exactly the same questions in the two 
conditions, but were giving very different answers. The results, then, support the 
view that children at this age utilise features of the non-linguistic context in interpret- 
ing the questions which they are being asked. In addition to features of the static 
perceptual array, the non-linguistic context includes the tester's actions and the child's 
interpretation of his underlying intentions. 


McGarrigle and Donaldson (1975, p. 347) conclude that their results “ give 
clear indications that traditional procedures for assessing conservation seriously 
underestimate the child's knowledge”. This seems a curious conclusion, apparently 
based upon the assumption that because the accidental condition generated a higher 
proportion of conserving judgments it must be a more sensitive index of the child's 
underlying ability. Failures in the standard condition are seen as ‘ false negatives ’, 
non-conservation arising from the implicit message: this transformation is important, 
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contained in the tester’s action. Should we not then regard successes in the accidental 
condition as ‘ false positives ', conservation arising from the implicit message: this 
transformation is irrelevant, contained in the tester’s action? If non-conserving 
judgments in the standard condition are artefactual, in other words, then conserving 
judgments in the accidental (or incidental) condition may also be artefactual. 


We seem thus to be further from, rather than nearer, an unbiased assessment of 
the child’s logical abilities. But, as McGarrigle and Donaldson rightly point out, the 
findings of these experiments may lead to a new conception of some of the changes 
which for Piaget mark the transition to operational thinking. Piaget (e.g., 1950) has 
spoken of social decentration as paralleling cognitive decentration, but in his later 
work he has studied the latter largely to the exclusion of the former. Present findings 
highlight difficulties which arise from this separation. A good deal more develop- 
mental evidence is needed, but it may transpire that an important aspect of the 
* transition to operativity ’ is the establishment of that degree of personal autonomy 
or detachment which enables the child to separate the meaning of words from the 
meaning of the contexts in which they are uttered. 
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REFLECTION-IMPULSIVITY AND READING ABILITY IN 
SEVEN-YEAR-OLD CHILDREN 


By TESSA ROBERTS 
(Department of Adult and Higher Education, University of Manchester) 


Summary. Reading ability and performance on the Matching Familiar Figures Test were 
compared for two samples of seven-year-old children, a sample of 70 constituting a complete 
top infant year group in two infant schools and a sample of 42 poor readers from nine top infant 
classes in the same city. Considerably more of the poor readers than the main sample were 
found to be impulsive. There was a consistent tendency for girls to be more reflective than boys. 


INTRODUCTION 


The hypothesis that the speed with which an individual typically makes a decision in a 
situation of uncertainty will affect his performance on certain tasks encountered in school 
has been the subject of many studies by Kagan and his co-workers. A comprehensive 
review of such work is offered by Messer (1976). 


Interest has Jargely focused on the impulsive, who responds quickly but inaccurately 
and the reflective, who responds slowly but with accuracy. The most widely used measure 
to determine how far an individual tends towards reflection or impulsivity is the Matching 
Familiar Figures Test (MFFT) developed by Kagan et al. (1964). 


This test creates a situation of response uncertainty. The subject is shown a figure and 
at the same time six alternative versions of the same figure, all of which are similar, only one 
being identical. The child is required to select the identical figure. The time to his first 
response after presentation and the number of errors he makes before he selects the right 
figure are recorded and thus two scores are obtained, one for time and one for errors. 
Typically, the impulsive is low on time and high on errors and the reflective high on time and 
low on errors. 


In many studies where the MFFT is used, the sample is divided according to a median 
split on the time and error scores which provides four groups: 


Impulsives: above median on errors, below on time; 
Reflectives: below median on errors, above on time; 
Slow inaccurates: above median on errors, above on time; 
Fast accurates: below median on errors, below on time. 


It is commonly found that roughly two-thirds of a random sample fall into the first two 
categories. 


The MFFT has been used in this study since it continues to form tbe basis for most 
research work in this area, but it should be noted that reservations about it have been 
expressed by, among others, Block ег al. (1974). 


This study concerns the association of reading performance with reflection/impulsivity. 
Findings have so far been conflicting and almost entirely American. Kagan himself (1965), 
looking at first grade children, found an association between word recognition scores and 
MFFT scores: “ The child who displayed long decision times and low error scores on MFF 
was most accurate in recognition of words." Halland Russell (1974) confirmed this tendency 
with third grade children. Kagan found a similar, although slightly more complex, situation 
when using a prose passage with second grade children. Denny (1974) cast doubts on such 
a relationship, finding that MFFT scores failed to distinguish between average and poor 
readers in the second to fifth grades. Hayes et al. (1976) also found a lack of relationship 
between word recognition ability and MFFT performance in mildly retarded children over a 
wide range (8 years-17 years). 


Although there is considerable evidence to show that girls have fewer reading problems 


than boys, no clear pattern has been established in relation to sex differences and MFFT . =~; 
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performance. According to Messer (1976), the findings lack consistency but when differences 
do appear, girls are shown to be slightly more reflective. 


METHOD 
Sample 
Main Sample: a complete year group at the top infant level (7+ years of age) in two 
infant schools. This involved 70 children, 33 boys and 37 girls. The two schools were 
both situated in the same city, one being designated a Social Priority school and the other 
not. This main group provided a basis for comparison for the group of poor readers. 


Poor Reader Sample: the poorest 10 per cent of readers in the top infant year group of 
nine schools in the same city. Five schools were Social Priority schools and four were not. 
In all, 42 children were involved, 28 boys (Mean Reading Age, 6 years 5 months) and 14 
girls (MRA 6 years 7 months). 


Procedure 

In the case of both samples, the Matching Familiar Figures Test and the Salford 
Sentence Reading Test were administered individually to each child, each test being given 
on a separate occasion. 


Analysis 

Initially each sample was analysed separately. Reading age, time to first response and 
number of errors were intercorrelated. A comparison was then made between the main 
sample and the poor readers for time, errors and the degree of correlation between these two 
measures and reading age. The performance of boys and girls was compared within both 
samples for errors and time. 


In addition, a median split was used to classify the main sample into the four categories 
of Impulsive, Reflective, Fast Accurate and Slow Inaccurate. The medians from the main 
sample were also applied to the sample of poor readers in order to categorise them similarly. 


One of the weaknesses of the median split method of categorisation has been held to be 
the arbitrary nature of categorisation around the median; the same scores could lead to a 
designation of impulsive in one sample and reflective in another. A further comparison was 
therefore made using a quartile rather than a median split. In this case, consideration was 
given only to the scores beyond the quartile where categorisation could be made with the 
greatest confidence. 


A comparison was also made of the proportion of each sex in the various categories as 
determined by both the median split and the quartile split. 


RESULTS 


The correlations between reading age and errors, and time and errors.are negative and 
those between reading age and time positive for both samples (Table 1). Only the time and 
errors correlations are statistically significant, however (P<0-01), the values for both 
samples, —0-437 and — 0-545, being close to Messer’s reported median ғ of —0-48. None 
of the differences between correlations for the two samples is significant. 


The comparison of overall means for the two samples showed that the poor readers 
responded more quickly (P <0-01) and that they made more mistakes (P « 0:01) than the 


TABLE 1 


INTER-CORRELATIONS OF TIME, ERRORS AND READING AGE FOR BOTH SAMPLES 


Variables МЕРТ time MFFT errors Reading age 





МЕЕТ time — —0:437* 0:216 
MFFT errors —0-545* — —0:210 
Reading age 0-106 — 0:223 — 


Poor readers in italics. 
«Ра <0:01. 
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main sample (Table 2). Girls showed a general tendency towards greater reflectivity in that 
they responded more slowly and made fewer errors than boys in both samples but only with 
time in the main sample was the difference statistically significant (P « 0:01). 


A comparison of the percentages which fall into the four categories (Impulsive, 
Reflective, Fast Accurate and Slow Inaccurate) as determined by a median split revealed 
that the major overall differences between the two samples lie in the impulsive/reflective 
areas (Table3). A considerably larger percentage (P « 0:02) of the poor readers are impulsive. 
This is at the expense of the reflective category where fewer of the poor readers are found 
(Р <0-02), the proportion of the two samples in the FA and SI categories being similar. 


A consistent tendency for proportionately more boys than girls to be classified as 
impulsive and more girls than boys to be classified as reflective was observable in both 
samples (Table 4). In overall terms the differences were statistically significant (P < 0:01 for 
impulsives, P «0-02 for reflectives) but differences within samples were not so. No analysis 
by sex was made for the FA and SI categories as numbers were so small. 


Categorisation according to a quartile split showed 36 per cent of the readers to be 
extreme impulsives compared with only 9 per cent of the main sample (P — «0-01). Where 
difference between sexes was statistically significant, two mirror images emerged: pro- 
portionately more boys than girls are found beyond the quartiles for impulsivity in the poor 
reader sample (P «0-05) and proportionately more girls than boys are found beyond the 
quartiles for reflectivity in the main sample (P « 0:05) (Table 5). 


TABLE 3 
PERCENTAGE OF EACH SAMPLE IN THE IMPULSIVE, REFLECTIVE, FAST 


ACCURATE AND SLOW INACCURATE CATEGORIES DETERMINED BY 
MEDIANS OF MAIN SAMPLE (numbers in brackets) 


x? Significance 





Variable Main sample Poor readers P< 
Impulsives 36 (25) 60 (25) 0-02 
Reflectives 36 (25) 14 (6) 0-02 
Fast accurates 14 (10) 14 (6) NS 
Slow inaccurates 14 (10) 12 (5) NS 

TABLE 4 


PERCENTAGE OF EACH SEX IN THE IMPULSIVE AND REFLECTIVE CATEGORIES FOR BOTH 
SAMPLES (median criterion) 

















Impulsives Reflectives 
Group Boys Girls (x2) P< Boys Girls (х2) P< 
Main sample 45 27 NS 24 46 NS 
Poor readers 68 43 NS 11 21 NS 
Total 56 31 0:01 18 39 0-02 
TABLE 5 


PERCENTAGE OF EACH SEX IN THE IMPULSIVE AND REFLECTIVE CATEGORIES FOR BOTH 
SAMPLES (quartile criterion) 











Impulsives Reflectives 
Group Boys Girls (x2) P< Boys Girls (x2) P< 
Main sample 12 5 NS 6 27 0-05 


Poor readers 46 14 0-05 0 0 — 
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DISCUSSION 


Kagan’s (1965) general findings of an association of reflection/impulsivity with reading 
ability are confirmed here in that there is a tendency for reading ability to be associated 
negatively with the making of mistakes and positively with the taking of time. АП this is 
true, too, of the sample of poor readers, In the case of both samples, the correlations of 
reading ability with time and errors are rather lower than Kagan found although direct 
comparison is difficult, as his correlations were established in more complex situations. 


A comparison between the two groups reveals that the poor readers are substantially 
more impulsive than the main sample, whether this is judged by a comparison of time and 
error means or by categorisation. If those who fall below the lowest quartile on time and 
the highest on errors can be considered to be at the extreme of impulsivity, it is important to 
note that over a third of the poor readers are in this category. Boys show themselves con- 
sistently to tend more towards impulsivity than girls, a factor which may help to account for 
the fact that they are more likely to experience reading difficulties than girls. 


Since impulsivity appears to be negatively related to reading success, it is important for 
teachers to be alert to this aspect of behaviour. Reading, particularly in the early stages, 
presents a situation of uncertainty where consideration of alternatives may be crucial in 
arriving at meaning. There is some indication (e.g., Barstis and Ford, 1977; Cohen and 
Przybycien, 1974) that it is possible to modify the style in which children habitually respond 
in such situations. 
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SCALING SMALL GROUPS AT THE O-GRADE STAGE 
By D. A. WALKER* 


SuMMARY. The Dunning Report, while advocating the use of scaled internal assessments in the 
award of certificates at the 16-year-old level in Scotland, has doubts on its practicability where 
the number of presentations from a school in a particular subject is small. The investigation 
reported in this article may dispel some of these doubts and also throw some light on the 
inferences that may, and may not, be drawn from the size of the correlation between internal 
and external assessments. 


INTRODUCTION 


A committee appointed by the Secretary of State for Scotland has considered questions 
associated with the assessment of pupils completing four years of secondary education and 
with the certificates that might be awarded on completion of these courses. One of the main 
recommendations in their report Assessment for All (Dunning, 1977) is that the awards 
should not be entirely dependent on scores made in external examinations but should be 
based partly on internal assessments provided by the teachers of the pupils. It is accepted 
by the Committee, and by most teachers, that these internal assessments would require 
standardisation of some kind to make them comparable between schools, and the Committee 
favour a scaling procedure, using the scores in the external examination as the basis for 
scaling the school's assessments in the subject. 


The Committee have reservations where the number of pupils in the group presented by 
the school in a particular subject is small, and propose that in these cases some type of 
moderation be used. Moderation is the term used for the method in which inspectors or 
teachers from other schools examine the work of the pupils and make what they, from their 
experience as examiners, consider to be appropriate adjustments to the internal assessments. 
It is a process which is time-consuming and probably expensive, and the question of where to 
draw the line between ‘ small’ and ‘ not small’ is therefore an important one. 


' The Dunning Report does not specify what size of group is to be considered as small for 
this purpose. One practical guide arises from a consideration of what have been described 
as ‘flop’ scores. In the process of scaling, the mean of the internal assessments is adjusted 
to become equal to the mean of the external marks scored by the group. If a member of the 
group makes an unexpectedly low score in the external examination, the group average is 
depressed and hence the average of the scaled assessments is also depressed. The point is 
that it is not only the unfortunate low-scoring pupil who suffers but also his or her fellows in 
the group. If the number is ten or more, the average effect on each is likely to be small, but 
with a number as low as five the effect is correspondingly greater. 


The reader may ask whether this effect does not operate with any low score and not 
merely the unexpectedly low score. The answer is that it does not; if the low score in the 
external examination corresponds to a low internal assessment it does not affect the scaled 
estimates of the remainder of the group. An example illustrating this point is given later. 


In the present Scottish SCE O-grade situation there is a substantial number of subject- 
groups with fewer than ten candidates in a school. In 1977 they formed about a quarter of 
all the groups presented for the examinations, and that proportion is likely to be increased 
if the recommendation of the Dunning Report for the provision of separate ‘ Credit’, 
* General’, and ‘ Foundation’ levels is accepted. It seemed, therefore, wise to inspect the 
data from a representative sample of these groups with a view to ascertaining whether the 
incidence of * flop’ scores was sufficiently large to warrant special measures being taken to 
neutralise their effect. 

METHOD 

The Scottish Certificate of Education Examination Board made available to the author 
the O-grade marks of a representative number of groups each containing between five and 
nine pupils. The sample was drawn from the whole of Scotland and covered presentations 
in Accounting, Art, Biology, English, German, Home Economics, Latin, Mathematics and 


+ 15 Ravelston Garden, Edinburgh, ЕНА 3LD. 


Research Notes 317 


Modern Studies. The schools concerned were then asked to provide the school marks on 
which were based the ranks submitted to the Board; these marks were treated as the internal 
assessments. In a few cases the marks were not available, but almost all of the schools 
provided data which enabled the necessary calculations to be completed for 74 groups drawn 
from 64 schools. 


For each group the O-grade marks were plotted against the school marks and the scaling 
line was drawn. This is the line = 
В = Е-(1- 55/51 


giving the scaled assessment Js, where I is the school mark or internal assessment, / its average 
for the group and 8; its standard deviation in the group, and Е and 5 are the average and 
standard deviation of the O-grade marks or external assessments achieved by the group in the 
subject. Each graph was then inspected for evidence of ‘ flop’ scores. There is at present 
no agreed procedure among statisticians for the detection of what are called outliers, and the 
classification of scores ав“ flops ' was therefore subjective to some degree. On the other hand, 
a set of points all but one of which indicated fairly clearly a straight line correspondence, 
while the remaining one was well below that line, strongly suggested a ‘ flop ° score. 


RESULTS 
The findings may be summarised thus: 


(1) There was a high level of correlation between the school assessments and the O-grade 
marks. The distribution of correlation coefficients is shown in Table 1. 


(2) There were only two cases of ‘ flop ' scores, one in English (seven pupils in the group) 
and one in Modern Studies (nine pupils in the group). In each case the O-grade mark of the 


TABLE 1 


DISTRIBUTION OF CORRELATIONS BETWEEN SCHOOL MARK AND O-GRADE MARK 


Correlation Coefficient x 100 





00- 10- 20- 30- 40- 50- 60- 70- 80- 90- 








Subject Negative 09 19 29 39 149 59 69 79 89 Totals 
Accounting I 1 2 1 2 7 
Art 1 1 1 2 1 1 7 
Biology 1 1 3 1 6 12 
English 1 2 3 4 10 
German 1 5 6 
Home Economics 1 2 2 1 4 10 
Latin 1 1 2 2 6 
Mathematics 1 1 2 2 6 
Modern Studies 2 1 1 1 5 10 

Totals 1 1 1 2 2 4 1 9 10 12 31 74 


TABLE 2 
SCHOOL MARKS, O-GRADE MARKS AND SCALED MARKS FOR ONE GROUP 





Scaled school mark 





Pupil School mark O-grade mark Columni1 Column2 Column3 : 
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pupil concerned was about 20 per cent (in terms of the usual distribution of school marks) 
below the value appropriate to his or her school mark and the scaling line determined by the 
other pupils in the group. : 

The results for the group with the * flop ' score in English provide material for several 
comments and are therefore given in Table 2, in which column 1 gives the scaled marks when 
all O-grade marks are taken into account, column 2 gives the scaled marks when T's score of 
13 is regarded as a ‘ flop ' and ignored in the calculations, and column 3 gives scaled marks 
when pupil V is also omitted from the calculations. 


DISCUSSION 


It will be observed that the exclusion of T's O-grade score makes a considerable difference 
to the scaled marks of pupils S, U and V. The fact that the exclusion of low-scoring V makes 
practically no further difference illustrates the comment made in the fourth paragraph of 
this article. 

If this sample of small groups may be taken as representative of the small groups likely 
to be presented in Scotland, it would seem that there is little to fear about the frequency of 
“Нор” scores at this stage. The technique which was devised and used for small groups at 
the stage of transfer from primary to secondary education, when the pupils were about 12 
years of age, appears to be required in only a very few cases when they have reached the age 
of 16. This is a comforting conclusion as the technique involved the scanning of graphs 
(McIntosh et al., 1962), the number of which might be overwhelmingin a national examination. 


The study has also thrown further light on the relation between ‘ flop’ scores and the 
size of the correlation of internal with external assessments. The correlation of school marks 
with O-grade marks for the seven pupils in Table 2 is 0-70. А‘ flop’ score may therefore lie 
concealed behind a moderately strong correlation. 


A third point concerns the relation between this correlation and the scalability of the 
school assessments. The Schools Council Examinations Bulletin 37 (1977) refers to a 
correlation of 0-5 or 0-6 being frequently quoted by Examination Boards in England and 
Wales as the minimum acceptable figure if linear scaling is to be used, although the Bulletin 
does go on to suggest that further investigations are required. This study has shown that 
where the number in a group is small a low correlation need not cause the rejection of the 
internal assessment as unworthy to be scaled. A low correlation may be a consequence of 
the relative homogeneity of the particular group, making it difficult for either the school or 
the external examination to produce a reliable order of merit. For example, five pupils 
presented by a school may have been given internal assessments 61, 62, 63, 64, 65 and obtain 
marks 63, 61, 65, 62, 64 in the external examination. The correlation is relatively low 
(r = 0:3), but the average assessment is the same in both sets, as also is the standard deviation. 
If the school's ranking is accepted as valid, the internal assessments can be accepted without 
change to form the scaled assessments. If there had been in the group a sixth pupil who had 
been given an internal assessment 50 and had scored any mark below 57, the correlation 
would have exceeded 0-9, showing how dependent the coefficient is on heterogeneity. The 
negative correlation in Table 1 came from a group which was closely bunched both in school 


marks and O-grade marks. 
CONCLUSION 


While these results are encouraging in showing that ‘ flop ' scores are unlikely to present 
large scale problems at the O-grade stage, it would be advisable to repeat the study, using 
internal assessments specifically designed for the purpose and making further inquiries in the 
schools concerned where the graphs presented unusual features. 
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BOOK REVIEWS 


Bassey, M. (1978). Nine Hundred Primary School Teachers. Windsor: NFER, 
рр. 127, p. £4:25. 


This is a demographic study, the declared purpose of which is to describe major practices 
of primary school teachers in a Nottinghamshire sample in order to increase teacher aware- 
ness of the range and variety of practices and thereby widen the base from which their class 
decisions are made. The study is in fact a quantitative tabulation of teacher-reported 
practices in relation to the organisation and management for learning and not of learning or 
teaching methods per se. 


Efficient use of school time is important. The interviewers attempted to elicit an 
operational (i.e., structural and functional) analysis of allocation of the school day and have 
included head teacher delegation of responsibility within it. The four teaching methods 
identified early in the Junior section are in fact organisational categories. 


Whether the questions asked are the salient ones is a problem for all such studies. The 
criterion for selection in this study was teacher interest, and to the researcher’s credit 
questions are all concerned with classroom realities, but the focal viewpoint taken is mana- 
gerial. One therefore finds that in the teaching of reading to infants, teacher expectations 
are sought in relation to the child reading aloud to the teacher. This is a particularly 
important aspect for demands upon teacher time but other equally important aspects of the 
teaching of reading are not included, many of which may be taught efficiently in group 
situations. It is at points such as this that organisational strategies and methodology meet. 


However, when the report strays into teaching methodology it is least convincing and 
data are strained, as instanced by the interpretative comment in relation to methods used for 
the teaching of reading to young children, that ‘ the word recognition method known as 
look and say, is the one most commonly used by these teachers’. One suspects that in 
relation to a particular child, these categories are not discrete but complementary; possibly 
an initial look and say and sentence approach followed by phonic encoding and decoding. 
A refined definition of ‘ younger ’ (a somewhat arbitrary and respondent inferential category 
in this report), is essential to an informed interpretation of this data. 


The overall lack of theoretical reference and particularly the omission of an index, list 
of tables and bibliography make it difficult to address questions to the data as presented. 
Limitations of the data need stating: in particular the study fails to provide an objective 
means of checking the possible differences between supposed and actual behaviour of 
teachers; neither is there an attempt to assess inter-reliability of administration of the 
questionnaire. The style of some questions may be evocative, for example the inclusion of 
* typical pupil * may over-generalise the responses. 


Although the work lacks critical comment upon recorded findings and assessment 
procedures, overall encapsulated in the report is a wide ranging, carefully tabulated quanti- 
tative statement reflecting how a sample of teachers retrospectively view the allocation of 
teaching time, the organisational procedures involved, supportive media and means of 
testing. The inclusion in the survey of such practical priorities as marking, display and 
disposal of children's work, visits and contact with parents means that the study provides a 
useful reference for teachers in relation to their own practice. 


A word of caution is needed. The predominant concern of the reported study is the 
organisation and management of learning and the reader should therefore stay both within 


this frame of reference and the parameters of the research design or he may emerge with 
ill-founded assumptions about teaching methods. 


BETTY WILKINSON. 


RUTTER, M., MAUGHAN, B., MORTIMORE, P., and OUsTON, J. (1979). Fifteen Thousand 
Hours: Secondary Schools and their Effects on Children. London: Open Books, 
pp. viii--279, p. #3-50, с. £7-50. 


As is implied by the title, children spend about 15,000 hours during their formative years 
in schools and the research reported in this book (which is part of a larger programme by 
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Professor Rutter and his colleagues) seeks to examine the impact which twelve non-selective, 
but previously selective, London secondary schools have upon the development of their 
pupils. The longitudinal research design of this substantial study has demanded the 
investment of considerable financial and personnel resources and thus has many of the 
qualities which most educational researchers can only day-dream about. A mass of data 
were collected describing the ‘ intake’ characteristics of the cohorts of pupils entering the 
twelve schools, the qualities and characteristics of the schools themselves and their environ- 
ments, and the subsequent ‘ outcome’ characteristics of the pupils. Data collected by 
standardised tests were supplemented with interviews and observations in the schools. Thus 
it was possible to investigate the effect of the educational process upon cohorts of children 
as they progressed through different school environments. 


The reaction of the ‘ quality press ’ to the publication of this book was to proclaim the 
news that schools are important; that while much contemporary research ‘ shows’ that 
schools don’t matter, this study demonstrates that schools do have a significant effect upon 
their pupils. And in a word the findings are that when differences in input characteristics 
are accounted for, different schools vary in their effects upon children’s behaviour, attendance, 
exam success and delinquency. Moreover, the evidence suggests that there is a causal 
relationship between characteristics of schools and variations in their outcomes. It must be 
emphasised at the outset that the authors resist the temptation to create ‘news’. The 
analyses of their evidence are meticulous and circumspect. They avoid any temptation to 
overstate their case. They remind the reader that they are continually making inferences 
from their data and, when appropriate, they explore alternative ways of evaluating their 
evidence. Their determination to signal to the lay reader that they are interpreting data and 
to indicate where their analyses become more speculative is a commendable feature of this 
report which other researchers could adopt. Indeed for the reviewer their attention to 
scholarship is somewhat galling. For example they resist the temptation to make inferences 
about causality from log linear models which are used appropriately to describe parsimoni- 
ously the interactions between variables, and they avoid the error, not uncommon among 
psychologically orientated educational researchers, of making inferences about the behaviour 
of persons from group or ecological data. 


One point of criticism which must be raised is that initially they set up something of a 
‘straw man’ to attack which is inappropriate and which detracts from the quality of the 
book. They begin by reviewing the well-known, large-scale American studies such as 
Coleman’s Equality of Educational Opportunity and Jenks’s Inequality in order to establish 
the point that schools do not make a difference to children’s development. However, the 
authors fail to mention the corpus of critical literature which surrounds these studies or to 
reflect accurately the theoretical context in which they are located. At the risk of simplifi- 
cation, these American macro-sociological studies seek to assess the impact of the educational 
system as a social institution and to ask whether manipulating the education system as an 
* independent variable" has an effect upon the life chances of children. Rutter and his 
colleagues remind us that schools, as individual institutions, can have a differential influence 
upon their pupils. But if, for example, a society has a high rate of unemployment and cannot 
offer opportunities to school leavers or if there are substantial variations in the distribution 
of wealth, then schooling may have very little effect upon children’s life chances and other 
societal factors may have a more dominant influence in shaping the child’s development. 


If schools do exert an influence it cannot be due only to their general ethos and character- 
istics. Teachers provide the vital link between the institutional structure and children’s 
learning and behaviour. The more intriguing issue is not how and why schools influence 
children's development but the impact of teachers on children. This problem is recognised 
by the authors but discussed within one and a half pages. One has every sympathy for their 
few guarded remarks on the effective teacher; the vast literature in this field is equivocal and 
riven with theoretical and methodological problems. However, while they may have had to 
chance their arm, a more extensive discussion about how their research could contribute to 
the teacher effectiveness debate would have been most instructive. For example in a few 
lines they inform us that inexperienced teachers are unsuccessful at class management and 
that the extent to which they can improve these skills is partially dependent on the schools 
they work in. These observations have fascinating implications for the training of teachers, 
the placement of probationer teachers, and the role of in post teachers in the inservice 
training of their inexperienced colleagues. 
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There is a danger that this book will become associated with a message and that in the 
next few years it will be merely cited in thousands of examination scripts to ‘ show’ that 
schools do matter. One can only hope that it will be included on education reading lists 
for a more important and worthwhile reason. Upon careful reading its impact is not so 
much the news it creates but the impression it gives of educational researchers ‘ at work ’, 
recognising they have a problem which raises conceptual and methodological issues and who 
carefully describe the alternative ways in which they can analyse their data. A group of 
researchers who accept that they are fallible and who are cautious and not dogmatic in the 
interpretation of their evidence. 

Davip MCNAMARA. 


WALL, W. D. (1977). Constructive Education for Adolescents. London and Paris: 
George Harrap and UNESCO, pp. xi 4-333, £11-00. 


Іп 1952 UNESCO organised a conference on “ Education and the Mental Health of 
Children in Europe ". The proceedings were written up by Professor Wall in the volume 
Education and Mental Health which was published in 1955. That book was warmly reviewed 
in this journal and for many years held its place as a successful and widely-read text. 


In view of the demand for the book UNESCO invited Professor Wall to undertake a 
complete revision of it. He decided that the increased complexity of the topic required the 
work to be divided and Constructive Education for Children, the first volume of what has now 
become a trilogy, was published in 1975, a quarter of a century after the initial conference. 
Constructive Education for Adolescents, the second volume, is the subject of this review and 
a third volume, dealing with the special problems of the handicapped and deviant, has just 
been published. 


This scene-setting is important in order better to appreciate content and approach. 
The original conference was a very wide-ranging affair and breadth of coverage is still a 
feature of this volume. "There can be few contemporary texts offering serious treatment of 
topics as diverse as cognitive growth, cross-sexual roles, integrated curricula, effectiveness of 
CD Eee education, alcohol abuse and selection and training of psychologists, for 
example. 


The contributions to the conference were offered by representatives of a large number 
of disciplines. Although one might expect a book from Professor Wall to be a text in 
educational psychology, this is far broader. He brings together the views of the sociologist, 
the anthropologist and the administrator, for example, and integrates them with those of 
the educational psychologist. 


Moreover the conference was a European conference. Many students can be forgiven 
for believing that apart from the activities of a well-known Genevan, the study of education 
in general and of educational psychology in particular is carried on almost exclusively in 
Britain and in North America. Thearticles which appear in this journal are usually supported 
by references which reinforce this view. But this book, like its predecessor, constantly refers 
Б the rich vein of European work often published not in English but in the major continental 

anguages. 

The position of the book as the centre volume of a trilogy is worth remarking. The 
first five chapters of the first volume, Constructive Education for Children, discuss the historical 
and social context within which it is the author’s view that educational issues should be 
considered, and the reader will find this material useful background to the first few chapters 
of this volume, which deal with theories of adolescence. The final chapter of this volume, 
with its emphasis on psychological services, directs the reader towards the major concerns of 
the third volume, which has just recently appeared. The core of this book is the group of 
five central chapters which deal with the school and the adolescent (two chapters), problems 
of secondary education, informal and recurrent education and mental health and teaching. 


The purpose of the book is to examine the education process in the light of the need to 
adapt to and control rapid social change. For the author the key lies in developing a concept 
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of mental health defined as dynamic adjustability, rather than adjustment. The author’s 
preface outlines his own personal belief that education for dynamic adjustability is also the 
route through which participatory democracy and a juster social order can be best achieved. 
It is necessary to mention this belief since it gives an intimation of the flavour of the book. 
And Wall’s view that schools must change in order to accept and accommodate to this 
educational aim comes through at many points. 


The book is an integrative text. It is concerned with major educational issues, not a 
detailed analysis of a small sector of education. Inevitably it does not touch on all educational 
developments. For example there is no reference to behaviour modification or interaction 
analysis—these are techniques and the book is not concerned with methodology. And it is 
salutary, at a time when educational disciplines are developing their own individualities, to 
be reminded through this book of an older view that education is a process best understood 
through joint illumination from a variety of academic standpoints. Its concern with major 
issues means that the detailed supporting research findings and notes are relegated to the 
ends of the chapters. Indeed, on first leafing through the book the reader is struck by the 
absence of illustration—there is no picture, no diagram, no table of experimental findings 
in the main body of the text and the mass of print is largely unrelieved. 


Yet it is surprisingly easy to read though difficult to review. It offers a distillation of 
wisdom and experience that repays frequent savouring. It is a book to return to often. It 
should be read by all educators, whatever their academic background, who care about the 
process of education, its purposes, its eternal questions and above all who care for the children 
and young people that education serves. This is an universal book. There are few who 
could have written it and none who could have written it as well as Professor Wall, It 
deserves to be at least as successful as its predecessor of 1955. 


Panir WILLIAMS. 


HEARNSHAW, L. S. (1979). Cyril Burt: Psychologist. London: Hodder and 
Stoughton, рр. 370, #8:95. 


Few biographies written about psychologists, or anyone else for that matter, can have 
posed so many problems for the author as this one. The brilliant, unscrupulous, charming, 
childish Burt is as difficult a biographical subject as can be imagined. Added to this are the 
allegations of fraud levelled against Burt while the book was being researched. The subse- 
quent controversy, Professor Hearnshaw informs us in the preface, ‘ rendered my task both 
unexpectedly different, and far more difficult than I had anticipated when I undertook it ”. 
This exemplifies the measured phrases and balanced treatment which characterises the whole 
book. 


The task of this important book is to draw and document fully a picture of Burt which 
accounts for the innumerable inconsistencies and glaring incompatibilities in his life and 
work. If Burt was such a competent statistician, how could he do such an incompetent job 
of ‘ fudging ’ the papers on twins, of which his critics were to make so much after his death? 
If he was not, how was he able to make such contributions as he did to factor analysis? 
Again, if Burt had a bad personality, how was he able to attract such warm recommendations 
and tributes from highly eminent colleagues, some of whom, like Spearman, were hardly 
famous for distributing lavish praise? But why, then, did the authorities at University 
College, London take the extraordinary step after his retirement from the professorship of 
psychology of debarring Burt from the premises? 


Professor Hearnshaw’s answers to these questions are in many ways more damaging to 
Burt than the accusations of his detractors through being fully backed up by documentation 
culled from a wide range of sources. The aim throughout the book is to examine the evidence 
first and to draw conclusions later in contrast to most recent writers about Burt who seem 
to start by announcing their attitudes and then look for facts to support them. The former 
approach is of course much more convincing and allows Burt’s shortcomings to be devasta- 
tingly exposed (such as his attempt to rewrite the early history of factor analysis) while 
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also bringing out the range and depth of his contributions to statistics and psychology 
without recourse to abstruse technicalities or more than a minimum of jargon. 


There is a basic difference between the treatment given by the book to Burt as a man 
and to Burt as a psychologist. As regards the former, Professor Hearnshaw has probably 
had access to all evidence which could be relevant and it is therefore justifiable to attempt 
to formulate final conclusions. This i is indeed the great strength of the book and it seems 
unlikely that evidence will appear in the future which will affect seriously the conclusions 
reached. 


As regards Burt the psychologist, however, it will not be possible for some years to say 
with finality just what his influence on the discipline will eventually be. Accordingly, this 
part of the book consists of an exposition of the main lines of Burt’s work with particular 
attention to the type of psychology and methodology which he favoured and to which he 
devoted much of his life. No attempt is made to evaluate this historically although some 
space is given over to the work of recent psychologists with the aim of deciding whether 
concepts proposed by Burt are actually regarded as tenable by contemporary writers. This 
is particularly marked in chapter four where Burt’s view of intelligence as innate, general, 
cognitive ability is handled in this way. Such discussion is relevant only to present debates 
within psychology and as such has no place in a work of historical scholarship, which is 
undoubtedly what this book is. It is the psychologist’s job, not the historian’s, to decide 
whether concepts are justifiable or not and the two types of argument, historical and 
scientific, should be kept more distinct than is always the case here, especially as “ Burt’s 
work is by now mainly a concern for historians ” (p. ix). 


But again, this does not affect the main achievement of this book, which is to explain 
the apparently inexplicable. In particular, it has the beneficial effect of putting into context 
much of the ideologically-based polemic written about Burt since his death. The biography 
is of course compulsory reading for anyone who even claims an interest in Burt, much less 
claims to know anything about him. In view of the difficulty of the subject matter on which 
it В амес it will certainly and deservedly become the standard work on the life of this unique 
individual. 


So what is Professor Hearnshaw’s conclusion? ‘ The verdict must be, therefore, that 
at any rate in three instances, beyond reasonable doubt, Burt was guilty of deception. He 
falsified the early history of factor analysis ...; he produced spurious data on MZ twins; 
and he fabricated figures on declining levels of scholastic achievement. Moreover, other 
material on kinship correlations is distinctly suspect. It would be tempting to go further 
and maintain, with Kamin, that all Burt’s work from the beginning was scientifically 
worthless. . . . Nevertheless such a judgment would be one-sided, and less than just. It 
would fail to account for the esteem in which Burt was held, almost, if not quite, universally 
in the early stages of his career, and by many up to the time of his death. ... It would 
disregard the assessment of contemporary experts in their appraisal of his work, and it would 
give insufficient weight to his many scholarly and practical achievements ” (p. 259). 


The main charges against Burt have thus been substantiated in this balanced and 
scholarly assessment. But to understand how Burt could have risked his reputation by 
frauds which were published for all to see is the main task of this fascinating and readable 
book. The reader is left with a picture of, in Valentine's words, “ one of the half-dozen 
greatest psychologists this century has produced ", yet this was also a man who seems to 
have developed іп the final phase of his life a “ marginally paranoid condition " which led 
him “ to cheat rather than see his opponents triumph ”’. 


* Human beings are not simple; and if Burt, too, had feet of clay, that should not be 
regarded, even by his admirers, as totally incredible. Before passing judgment, we must 
seek to understand ” (p. 262). 


Given the general very high quality of the scholarship, it is surprising to find a number 
of minor errors of detail; for instance, Burt's literary executor is Grete Archer, not Gretl; 
one of Burt's rivals for the LCC job in 1912 was W. H. Winch, not W. A. Winch; the 
reference on page 41 to an Australian industrial psychologist, W. R. Muscio, I assume is to 
Bernard Muscio (unless there were two of them); and there are inevitable problems with 
the footnotes—for one, numbers 40 and 44 on page 237 cannot both be correct. Also, some 
of the claims made about Burt are a shade sweeping, such as that he was “ the first to function 
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as a psychologist outside the walls of a university ". In fact, recent research has disclosed 
that the above-mentioned W. H. Winch was functioning as exactly that, though in an unpaid 
capacity, for several years prior to Burt’s appointment. Similarly, an over-concentration on 
Burt’s role, as is possibly inevitable in a biography of this type, causes certain impressions 
to be conveyed. This is most noticeable in the discussion on the influence of psychologists 
on educational policy, which contains nothing to indicate that any psychologist other than 
Burt was ever consulted at all, Thus Godfrey Thomson, whose Moray House tests were of 
seminal importance in this respect, does not get a single mention in chapter seven, which is 
given over to this topic. 


Finally, I should like to point out that the list of Burt’s writings provided on pp. 321-338 
contains certain errors for the years before 1940. This is primarily my fault. My list, on 
which Professor Hearnshaw’s is based in part for this period, has now been updated and 
will appear as an appendix to an article by Dr. G. Sutherland and myself forthcoming in the 
journal History of Science. This list and Professor Hearnshaw's should be consulted 
together since each contains a few items missing in the other. 

STEPHEN SHARP. 
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